Status and OSR Project: Study 1 and 2 Data Analysis
Author
Benjamin J. Zubaly
Published
July 19, 2024
Introduction
This document tracks the data analysis for both Study 1 and 2 of our 2022 USRA project “Sexual selection and offspring sex ratio: Facial width-to-height ratio, dominance, and prestige”.
Currently, the important files in this project are this quarto document (which is where all of the code and output will be tracked) and the data folder. The data directory contains the clean versions of the individual participant data (from Study 1; Clean_Individual_Data.xlsx) and the facial ratings data (Clean_Ratings_Data.xlsx). It also contains a rich text file where I took notes on the preparation of these data sets (preparation_of_datasets.rtf).
Installing Packages
I am using the groundhog package here to ensure that packages and dependencies are all installed with their versions on April 1, 2024, which will enhance the reproducibility of this code. In the following chunk, I have suppressed the output of the groundhog.library() call because rendering to a PDF was not working due to an invalid character in the output.
# Install and load the groundhog package to allow for standardized installation of package versionsinstall.packages("groundhog")
Installing package into '/Users/benjaminzubaly/Library/R/arm64/4.4/library'
(as 'lib' is unspecified)
The downloaded binary packages are in
/var/folders/38/1ybnplc53zdb089bn6drqfn00000gn/T//RtmpbsMLRC/downloaded_packages
library(groundhog)
Attached: 'Groundhog' (Version: 3.2.0)
Tips and troubleshooting: https://groundhogR.com
# Creating vector of packages to be installedpkg <-c("interactions", "effsize", "car", "psych", "readxl", "stats", "lmtest", "dplyr", "sandwich")# Use groundhog to install and load other packagessuppressMessages({groundhog.library(pkg, "2024-06-01")})
Initial Variables in Data Sets
For reference, I will define all of the variables in the datasets, because I have created shorter names to make things easier in this analysis.
Clean_Individual_Data.xlsx:
ID = The randomly generated ID of the participant.
rec_meth = The recruitment method for the participant (0 = Prolific Recruitment, 1 = School Recruitment).
study1_part = The part of Study 1 of which the participant completed the main survey (2 = Part 2, 3 = Part 3, blank = Not applicable because they were a school recruit).
part_round = The round of recruitment, within the Study 1 part, in which the participant was recruited (blank for school recruits).
partner_completed = Whether or not the partner of the the participant also completed the study (0 = Partner did not complete, 1 = Partner did complete).
partner_ID = The ID of the partner of the participant (blank if missing).
study2_survey = The survey number the participant was included in for the facial ratings procedure of Study 2.
fWHR_ben = Benjamin’s measurements of facial width-to-height ratio (fWHR) for facial images.
fWHR_mad = Madison’s measurements of fWHR for facial images.
expression_not_neutral = Whether the participant had a neutral facial expression in their facial image, as determined by Benjamin (0 = Neutral, 1 = Not Neutral).
angled_face = Whether the participant’s face was angled in their facial image enough to alter the relationship between fWHR markers, as determined by Benjamin (0 = Not Angled, 1 = Angled).
obstructed_face = Whether the person’s facial markers of fWHR were obstructed (0 = No Obstruction, 1 = Obstruction).
useable_for_ratings = Whether the photograph was deemed eligible for the facial ratings procedure as per our criteria.
photo_resubmission = Whether the person resubmitted a facial photograph because their first photograph submission did not match our criteria.
sex = Sex of participant (0 = Female, 1 = Male).
ethnicity = Ethnicity of participant (1 = African, 2 = Black/African American, 3 = Caribbean, 4 = East Asian, 5 = Latino/Hispanic, 6 = Middle Eastern, 7 = Mixed, 8 = Native American or Alaskan Native, 9 = South Asian, 10 = White/Caucasian, 11 = White/Sepharic Jew, 12 = Black/British, 13 = White Mexican, 14 = Romani/Traveller, 15 = South East Asian, 16 = Indian, 17 = Filipino, 0 = Other)
age_first_bio_child = Age of the first biological child of the participant in years.
sex_first_bio_child = Sex of the first biological child of the participant (0 = Female, 1 = Male).
num_bio_child = Number of children of the participant.
has_adopted_child = Whether the participant has an adopted child (0 = No, 1 = Yes).
ppp_adjusted_income_usd = Income adjusted for purchasing power parity (PPP) in USD, to the nearest cent.
ISCO-08_code = The code assigned to participant’s occupation, according to the International Standard Classification of Occupations (International Labor Office, 2012).
ISEI-08_occ_status = The occupational status score according to the International Socio-Economic Index (ISEI-08) (Ganzeboom, 2010) based on the ISCO-08 code.
SSS = Subjective social status.
SR_dom_cheng = Self-reported dominance on the Dominance-Prestige Scales (Cheng et al., 2010).
PR_dom_cheng = Partner-reported dominance via our adaptation of the Dominance-Prestige Scales (Cheng et al., 2010).
SR_pres_cheng = Self-reported prestige on Dominance-Prestige Scales (Cheng et al., 2010).
PR_pres_cheng = Partner-reported prestige via our adaptation of the Dominance-Prestige Scales (Cheng et al., 2010).
ind_agg_full = the full score for the Indirect Aggression Scale Aggressor Version (IAS-A) (Forrest et al., 2005).
ind_agg_soc_excl = The social exclusion subscale of the IAS-A (Forrest et al., 2005).
ind_agg_mal_hum = The malicious humor subscale of the IAS-A (Forrest et al., 2005).
ind_agg_guilt = The guilt induction subscale of the IAS-A (Forrest et al., 2005).
photo_method = The method with which the facial photograph was taken (0 = Computer, 1 = Phone).
Clean_Ratings_Data.xlsx:
Start Date = Date and time the rater began the survey.
End Date = Date and time the rater finished the survey.
Custom Data = Identifier for the rater.
Survey # = The Study 2 survey number which the rater participated in.
Sex = The sex of the rater (“FEMALE” or “MALE”).
Age = The age of the rater in years.
typeofitem_survey#_page#_ID = This is how the rest of the columns are structured, and they represent the ratings of a facial characteristic for a particular facial image where typeofitem is either dominance, masculinity/feminity, or attractiveness, survey# is the number of the Study 2 survey that they facial images was included in, page# is the page of the survey that the facial image was presented on, and ID is the ID of the participant associated with the facial image. In total, there are 801 of these columns, three for each facial image rated and one for each characteristic of each image rated. They are arranged with dominance ratings first, then masculinity/feminity ratings, then attractiveness ratings.
Study 1 Analysis
Data Exploration
Descriptive Statistics
Continuous Variables
First, we will read in the data as a data frame called individualsdata and calculate descriptive statistics for the continuous variables in the sample.
# Reading in the individuals dataset as a data frame called individualsdata individualsdata <-read_excel("./data/Clean_Individual_Data.xlsx", sheet =1)# Defining continuous variables to describecont.variables <- individualsdata[c("age", "age_first_bio_child", "num_bio_child", "ppp_adjusted_income_usd", "ISEI-08_occ_status", "SSS", "SR_dom_cheng", "PR_dom_cheng", "SR_pres_cheng", "PR_pres_cheng", "SAT", "IPIP_dom")]# Calculating descriptive statistics with the psych package's describe() functiond.stats.cont.ind <-describe(cont.variables, na.rm =TRUE)# Displaying the resultprint(d.stats.cont.ind)
Income Adjusted for Purchasing Power Parity (in USD):
Mean (SD): 46,720.56 (36,994.16)
Median: 43,047.78
Range: 203,252.03 (0-203,252.03)
Occupational Status (ISEI-08)
Mean (SD): 60.23 (19.025)
Median: 65.01
Range: 75.16 (11.56-86.72)
Subjective Social Status:
Mean (SD): 50.63 (14.895)
Median: 50
Range: 85 (0-85)
Self-Reported Dominance:
Mean (SD): 2.92 (±1.06)
Median: 2.88
Range: 6 (1-7)
Partner-Reported Dominance:
Mean (SD): 2.58 (±.99)
Median: 2.44
Range: 4.44 (.888-5.333)
Self-reported Prestige:
Mean (SD): 5.04 (±.97)
Median: 5
Range: 5.111 (1.889-7)
Partner-Reported Prestige:
Mean (SD): 5.41 (±.92)
Median: 5.444
Range: 4.222 (2.777-7)
Simple Adjectives Test (SAT):
Mean (SD): 2.50 (±2.21)
Median: 2
Range: 11 (0-11)
IPIP Dominance:
Mean (SD): 27.57 (±8.90)
Median: 28
Range: 37 (11-48)
Categorical Variables
Now we will make our categorical variables into factor variables and calculate frequencies for each.
# Recoding variables currently numeric as factors# Recruitment method to prolific or schoolindividualsdata$rec_meth <-factor(individualsdata$rec_meth, levels =c(0, 1), labels =c("prolific", "school"))# Photo Resubmission to no_submission or had_submissionindividualsdata$photo_resubmission <-factor(individualsdata$photo_resubmission, levels =c(0, 1), labels =c("no_resubmission", "had_resubmission"))# Ethnicityindividualsdata$ethnicity <-factor(individualsdata$ethnicity, levels =c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17), labels =c("Other", "African", "Black/African_American","Caribbean", "East Asian", "Latino/Hispanic", "Middle Eastern", "Mixed", "Native American or Alaskan Native", "South Asian", "White/Caucasian", "White/Sepharic Jew", "Black/British", "White Mexican", "Romani/Traveller", "South East Asian", "Indian", "Filipino"))# Nationalityindividualsdata$nationality <-factor(individualsdata$nationality, levels =c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24), labels =c("United Kingdom", "United States", "Ireland","Germany", "France", "Spain", "Canada", "Mexico", "Italy", "South Africa", "Hungary", "Zimbabwe", "Portugal", "Poland", "New Zealand", "Austria", "Bulgaria", "Australia", "Nigeria", "Phillipines", "Pakistan", "Brazil", "Turkey", "Netherlands"))# Expression not neutral to neutral or not_neutralindividualsdata$expression_not_neutral <-factor(individualsdata$expression_not_neutral, levels =c(0, 1), labels =c("neutral", "not_neutral"))# Sex to female or maleindividualsdata$sex <-factor(individualsdata$sex, levels =c(0, 1), labels =c("female", "male"))# Sex of first biological child to female or maleindividualsdata$sex_first_bio_child <-factor(individualsdata$sex_first_bio_child, levels =c(0, 1), labels =c("female", "male"))# Has adopted child to no or yesindividualsdata$has_adopted_child <-factor(individualsdata$has_adopted_child, levels =c(0, 1), labels =c("no", "yes"))# Photograph method to computer or phoneindividualsdata$photo_method <-factor(individualsdata$photo_method, levels =c(0, 1), labels =c("computer", "phone"))# Frequencies of categorical variablesrec_meth_table <-table(individualsdata$rec_meth) # Calculating frequenciesrec_meth_table # Displaying the frequencies
prolific school
257 8
photo_resub_table <-table(individualsdata$photo_resubmission) # Calculating frequenciesphoto_resub_table # Displaying the frequencies
no_resubmission had_resubmission
231 34
ethnicity_table <-table(individualsdata$ethnicity) # Calculating frequenciesethnicity_table # Displaying the frequencies
Other African
0 52
Black/African_American Caribbean
8 2
East Asian Latino/Hispanic
10 4
Middle Eastern Mixed
2 7
Native American or Alaskan Native South Asian
0 7
White/Caucasian White/Sepharic Jew
165 0
Black/British White Mexican
3 0
Romani/Traveller South East Asian
0 4
Indian Filipino
1 0
nationality_table <-table(individualsdata$nationality) # Calculating frequenciesnationality_table # Displaying the frequencies
United Kingdom United States Ireland Germany France
120 32 0 0 0
Spain Canada Mexico Italy South Africa
6 8 0 5 55
Hungary Zimbabwe Portugal Poland New Zealand
3 7 9 5 2
Austria Bulgaria Australia Nigeria Phillipines
1 1 2 2 1
Pakistan Brazil Turkey Netherlands
1 1 2 2
expression_neutral_table <-table(individualsdata$expression_not_neutral) # Calculating frequenciesexpression_neutral_table # Displaying the frequencies
neutral not_neutral
230 35
sex_table <-table(individualsdata$sex) # Calculating frequenciessex_table # Displaying the frequencies
female male
141 124
child_sex_table <-table(individualsdata$sex_first_bio_child) # Calculating frequencieschild_sex_table # Displaying the frequencies
female male
108 157
adopted_child_table <-table(individualsdata$has_adopted_child) # Calculating frequenciesadopted_child_table # Displaying the frequencies
no yes
264 1
photo_meth_table <-table(individualsdata$photo_method) # Calculating frequenciesphoto_meth_table # Displaying the frequencies
computer phone
48 215
Sex of participant (parent):
Females: n=141, 53.2%
Males: n=124, 46.8%
Sex of first biological child:
Female: n=108, 40.8%
Male: n=157, 59.2%
Ethnicity:
For ethnicity, there were a higher proportion of White/Caucasian (n=165) and African (n=52) respondents, with other ethnicities represented at n≤10.
Nationality:
For nationality, the most represented groups were participants from the United Kingdom (n=120), South Africa (n=55), and the United States (n=32), with all other nationalities at n≤9.
Has Adopted Children:
No: n=264, 99.6%
Yes: n=1, 0.4%
Data Visualization
Now I will create histograms of each continuous variable to visualize their distributions.
# Create histograms for each variable using the base hist functionfor (variable innames(cont.variables)) {# Create histogramhist(cont.variables[[variable]], main =paste("Histogram of", variable), xlab = variable)}
Each of the variables seems to tend toward a normal distribution except for SAT (which, being a count variable, seems to follow a poisson distribution), the number of biological children (which is likely influenced by it being our selection criteria and stopping rules), and income (which is commonly positively skewed).
Assessing Whether to Create Behavioral Dominance Composite
To determine whether self-and partner-reported dominance and self- and partner-reported prestige will be averaged into a composite measure, zero-order Pearson correlations will be calculated.
# Running the correlation between self-and partner-reported dominancedom.corr <-corr.test(individualsdata$SR_dom_cheng, individualsdata$PR_dom_cheng)print(dom.corr)
Call:corr.test(x = individualsdata$SR_dom_cheng, y = individualsdata$PR_dom_cheng)
Correlation matrix
[1] 0.48
Sample Size
[1] 230
These are the unadjusted probability values.
The probability values adjusted for multiple tests are in the p.adj object.
[1] 0
To see confidence intervals of the correlations, print with the short=FALSE option
# Runnign the correlation between self-and partner-reported prestigepres.corr <-corr.test(individualsdata$SR_pres_cheng, individualsdata$PR_pres_cheng)print(pres.corr)
Call:corr.test(x = individualsdata$SR_pres_cheng, y = individualsdata$PR_pres_cheng)
Correlation matrix
[1] 0.35
Sample Size
[1] 230
These are the unadjusted probability values.
The probability values adjusted for multiple tests are in the p.adj object.
[1] 0
To see confidence intervals of the correlations, print with the short=FALSE option
Although the correlations are significant, they are not as high as I would like them to be. We will first conduct the analysis with only the self-reported dominance and prestige, and then we will assess them as composite predictors.
Self- and partner-reported dominance: r=.48 (p<.01)
Self- and partner-reported prestige: r=.35 (p<.01)
Assessing Demographic Differences Between Parents of First-Born Sons and First-Born Daughters
To assess for biased demographic characteristics (sex and age) between parents of first-born sons and first-born daughters, we will conduct a chi-square test and t-test.
# Chi-square test to assess whether parental sex is associated with sex of first-born in our sample# Creating the frequencies tablepar.dem.freq.table <-table(individualsdata$sex_first_bio_child, individualsdata$sex)# Conducting the chi-square testchi.parent.dem <-chisq.test(par.dem.freq.table, correct =FALSE)# Displaying the resultprint(par.dem.freq.table)
# t-test to assess whether parental age differs between sex of child# QQ-plot and Shapir-Wilk test to assess normalityqqnorm(individualsdata$age); qqline(individualsdata$age)
shapiro.test(individualsdata$age)
Shapiro-Wilk normality test
data: individualsdata$age
W = 0.98512, p-value = 0.007436
# Levene's test for equality of variancesleveneTest(y = individualsdata$age, group = individualsdata$sex_first_bio_child)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 0.3623 0.5477
263
# Conducting the t.testt.parent.dem <-t.test(data = individualsdata, age ~ sex_first_bio_child, var.equal =TRUE)# Displaying the resultprint(t.parent.dem)
Two Sample t-test
data: age by sex_first_bio_child
t = -0.64581, df = 263, p-value = 0.519
alternative hypothesis: true difference in means between group female and group male is not equal to 0
95 percent confidence interval:
-1.656494 0.838259
sample estimates:
mean in group female mean in group male
33.54630 33.95541
Chi-Square Test:
The chi-square test indicates that there is not a significant relationship between parental sex and sex of first-born (χ²(1) = .15, p = .7).
T-test:
The t-test assumption of normality was not supported by the Shapiro-Wilk test, but a visual inspection of the QQ-plot (as wella as the histogram above) indicates that it does not deviate substantially from normality. The significant Shapir-Wilk test is likely overpowered with our sample size. The Levene’s test is not significant, indicating that we should accept the assumption of homogeneity of variances. Therefore, the t-test is ran with equal variances assumed. There is no significant difference between the mean age of parents of each sex (t(263) = -.65, p = .519).
Testing For Trivers-Willard Effects with Social Status Operationalizations:
To test for Trivers-Willard effects of social status, we will test three binary logistic regression models with the social status indicator, sex, and their interaction as predictors. The three social status indicators are income (ppp_adjusted_income_usd), occupational status (ISEI-08_occ_status), and subjective social status (SSS). Before fitting the models, we will first create standardized versions of those variables as z_income, z_occ_status, and z_SSS. In addition, I will recode sex so that female is the moderator (i.e., female = 1).
# Creating standardized scores for status indicatorsindividualsdata$z_income <-scale(individualsdata$ppp_adjusted_income_usd)individualsdata$z_occ_status <-scale(individualsdata$`ISEI-08_occ_status`)individualsdata$z_SSS <-scale(individualsdata$SSS)# Creating a new variable sex_2, which represents the participants' sex as 0 = male and 1 = femaleindividualsdata$sex <-ifelse(individualsdata$sex =="female", 1, 0)# Converting sex back to a factor variable with labelsindividualsdata$sex <-factor(individualsdata$sex, levels =c(0, 1), labels =c("male", "female"))
Model 1: Income
First, we will fit our model.
# Fitting the model with income, sex, and their interactionincome_status_model <-glm(sex_first_bio_child ~ z_income + sex + z_income:sex, family =binomial(link = logit), data = individualsdata)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the modelplot(income_status_model, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, but because the ln of a negative number is undefined we must first transform our continuous predictor variable to be positive by adding a constant. The code below adds a constant of 10 that makes all values for the variable positive before making the natural log transformation.
# Creating c_z_income, which represents standardized income after adding a constant of 10individualsdata$c_z_income <- individualsdata$z_income +10# Creating ln_c_z_income, which represents the natural log of the standardized income scores after adding a constant of 10individualsdata$ln_c_z_income <-log(individualsdata$c_z_income)
Now we need to fit and summarize a model with the main effects of the continuous variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that the interaction term is not significant, thereby satisfying the assumption of linearity of the logit.
# Fitting the model for the Box-Tiddwell procedureBT_test_income_model <-glm(sex_first_bio_child ~ c_z_income + sex + c_z_income:ln_c_z_income, family =binomial(link = logit), data = individualsdata)# Summarizing the modelsummary(BT_test_income_model)
Call:
glm(formula = sex_first_bio_child ~ c_z_income + sex + c_z_income:ln_c_z_income,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 36.77277 29.37186 1.252 0.211
c_z_income -12.66782 9.60823 -1.318 0.187
sexfemale -0.04325 0.29637 -0.146 0.884
c_z_income:ln_c_z_income 3.91271 2.89506 1.352 0.177
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 289.05 on 211 degrees of freedom
Residual deviance: 280.56 on 208 degrees of freedom
(53 observations deleted due to missingness)
AIC: 288.56
Number of Fisher Scoring iterations: 5
Because the interaction term is not significant (p = .18), the assumption of a linear relationship between continuous IVs and the logit of the DV is satisfied.
Summary of the Model
Now we will summarize the model, along with the Chi-square test for whether the model fits significantly better than the model with only the intercept.
# Producing the summary of the modelsummary(income_status_model)
Call:
glm(formula = sex_first_bio_child ~ z_income + sex + z_income:sex,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.33125 0.20536 1.613 0.107
z_income 0.27414 0.22068 1.242 0.214
sexfemale 0.02617 0.29689 0.088 0.930
z_income:sexfemale 0.20572 0.33187 0.620 0.535
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 289.05 on 211 degrees of freedom
Residual deviance: 282.50 on 208 degrees of freedom
(53 observations deleted due to missingness)
AIC: 290.5
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_income_model <- income_status_model$null.deviance - income_status_model$devianceChi_income_model
[1] 6.541723
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_income_model <- income_status_model$df.null - income_status_model$df.residualdf_income_model
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_income_model <-1-pchisq(Chi_income_model, df_income_model)prob_Chi_income_model
[1] 0.08803148
Looking at the chi-square test, the model is marginally significant (χ²(3) = 6.54, p = .09). However, none of the other variables, including income, are significant. I will exponentiate the coefficients to reveal the odds-ratio of the coefficients.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(income_status_model$coefficients)
# Fitting the model with occupational status, sex, and their interactionoccupational_status_model <-glm(sex_first_bio_child ~ z_occ_status + sex + z_occ_status:sex, family =binomial(link = logit), data = individualsdata)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the modelplot(occupational_status_model, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, but because the ln of a negative number is undefined we must first transform our continuous predictor variable to be positive by adding a constant. The code below adds a constant of 10 that makes all values for the variable positive before making the natural log transformation.
# Creating c_z_occ_status, which represents standardized income after adding a constant of 10individualsdata$c_z_occ_status <- individualsdata$z_occ_status +10# Creating ln_c_z_occ_status, which represents the natural log of the standardized occupational status scores after adding a constant of 10individualsdata$ln_c_z_occ_status <-log(individualsdata$c_z_occ_status)
Now we need to fit and summarize a model with the main effects of the continuous variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that the interaction term is not significant, thereby satisfying the assumption of linearity of the logit.
# Fitting the model for the Box-Tiddwell procedureBT_test_occ_status_model <-glm(sex_first_bio_child ~ c_z_occ_status + sex + c_z_occ_status:ln_c_z_occ_status, family =binomial(link = logit), data = individualsdata)# Summarizing the modelsummary(BT_test_occ_status_model)
Call:
glm(formula = sex_first_bio_child ~ c_z_occ_status + sex + c_z_occ_status:ln_c_z_occ_status,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -15.0458 25.3360 -0.594 0.553
c_z_occ_status 5.0833 8.6475 0.588 0.557
sexfemale -0.1340 0.2606 -0.514 0.607
c_z_occ_status:ln_c_z_occ_status -1.5310 2.6506 -0.578 0.564
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 333.42 on 246 degrees of freedom
Residual deviance: 332.29 on 243 degrees of freedom
(18 observations deleted due to missingness)
AIC: 340.29
Number of Fisher Scoring iterations: 4
Because the interaction term is not significant (p = .564), the assumption of a linear relationship between continuous IVs and the logit of the DV is satisfied.
Summary of the Model
Now we will summarize the model, along with the Chi-square test for whether the model fits significantly better than the model with only the intercept.
# Producing the summary of the modelsummary(occupational_status_model)
Call:
glm(formula = sex_first_bio_child ~ z_occ_status + sex + z_occ_status:sex,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.4597 0.1897 2.423 0.0154 *
z_occ_status 0.2597 0.1955 1.329 0.1840
sexfemale -0.1458 0.2609 -0.559 0.5762
z_occ_status:sexfemale -0.3061 0.2620 -1.169 0.2426
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 333.42 on 246 degrees of freedom
Residual deviance: 331.25 on 243 degrees of freedom
(18 observations deleted due to missingness)
AIC: 339.25
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_occ_status_model <- occupational_status_model$null.deviance - occupational_status_model$devianceChi_occ_status_model
[1] 2.165361
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_occ_status_model <- occupational_status_model$df.null - occupational_status_model$df.residualdf_occ_status_model
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_occ_status_model <-1-pchisq(Chi_occ_status_model, df_occ_status_model)prob_Chi_occ_status_model
[1] 0.5388033
Looking at the chi-square test, the entire model is not significant (χ²(3) = 2.165, p = .539). In addition, none of the other variables, except for the intercept, are significant. I will exponentiate the coefficients to reveal the odds-ratio of the coefficients.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(occupational_status_model$coefficients)
# Fitting the model with subjective social status, sex, and their interactionSSS_status_model <-glm(sex_first_bio_child ~ z_SSS + sex + z_SSS:sex, family =binomial(link = logit), data = individualsdata)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the modelplot(SSS_status_model, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, but because the ln of a negative number is undefined we must first transform our continuous predictor variable to be positive by adding a constant. The code below adds a constant of 10 that makes all values for the variable positive before making the natural log transformation.
# Creating c_z_SSS, which represents standardized SSS after adding a constant of 10individualsdata$c_z_SSS <- individualsdata$z_SSS +10# Creating ln_c_z_SSS, which represents the natural log of the standardized SSS scores after adding a constant of 10individualsdata$ln_c_z_SSS <-log(individualsdata$c_z_SSS)
Now we need to fit and summarize a model with the main effects of the continuous variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that the interaction term is not significant, thereby satisfying the assumption of linearity of the logit.
# Fitting the model for the Box-Tiddwell procedureBT_test_SSS_status_model <-glm(sex_first_bio_child ~ c_z_SSS + sex + c_z_SSS:ln_c_z_SSS, family =binomial(link = logit), data = individualsdata)# Summarizing the modelsummary(BT_test_SSS_status_model)
Call:
glm(formula = sex_first_bio_child ~ c_z_SSS + sex + c_z_SSS:ln_c_z_SSS,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.12274 13.19672 -0.161 0.872
c_z_SSS 0.63992 4.53648 0.141 0.888
sexfemale -0.07983 0.25213 -0.317 0.752
c_z_SSS:ln_c_z_SSS -0.16722 1.39572 -0.120 0.905
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 358.26 on 264 degrees of freedom
Residual deviance: 357.50 on 261 degrees of freedom
AIC: 365.5
Number of Fisher Scoring iterations: 4
Because the interaction term is not significant (p = .905), the assumption of a linear relationship between continuous IVs and the logit of the DV is satisfied.
Summary of the Model
Now we will summarize the model, along with the Chi-square test for whether the model fits significantly better than the model with only the intercept.
# Producing the summary of the modelsummary(SSS_status_model)
Call:
glm(formula = sex_first_bio_child ~ z_SSS + sex + z_SSS:sex,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.41245 0.18471 2.233 0.0256 *
z_SSS 0.16039 0.18929 0.847 0.3968
sexfemale -0.07916 0.25203 -0.314 0.7534
z_SSS:sexfemale -0.11409 0.25300 -0.451 0.6520
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 358.26 on 264 degrees of freedom
Residual deviance: 357.31 on 261 degrees of freedom
AIC: 365.31
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_SSS_status_model <- SSS_status_model$null.deviance - SSS_status_model$devianceChi_SSS_status_model
[1] 0.9452272
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_SSS_status_model <- SSS_status_model$df.null - SSS_status_model$df.residualdf_SSS_status_model
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_SSS_status_model <-1-pchisq(Chi_SSS_status_model, df_SSS_status_model)prob_Chi_SSS_status_model
[1] 0.814502
Looking at the chi-square test, the entire model is not significant (χ²(3) = .945, p = .815). In addition, none of the other variables, except for the intercept, are significant. I will exponentiate the coefficients to reveal the odds-ratio of the coefficients.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(SSS_status_model$coefficients)
Hypothesis 1 is that higher parental dominance increases the probability of having a son, and higher parental prestige neither increases nor decreases the probability of having a son. This hypothesis will be tested through three models. The first will have dominance and prestige as predictor variables of offspring sex. The second will have dominance, sex, and the interaction between dominance and sex as predictors of offspring sex. The third model will have prestige, sex, and the interaction between prestige and sex as predictors of offspring sex.
Before constructing the models and testing them, we will first standardize the predictor variables.
# Standardizing the two predictorsindividualsdata$z_SR_dom_cheng <-scale(individualsdata$SR_dom_cheng)individualsdata$z_SR_pres_cheng <-scale(individualsdata$SR_pres_cheng)
Model 1: Dominance and Prestige Main Effects
First, we will fit our model.
# Fitting the model with self-reported dominance and prestigehyp_1_main_effects_model <-glm(sex_first_bio_child ~ z_SR_dom_cheng + z_SR_pres_cheng, family =binomial(link = logit), data = individualsdata)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the modelplot(hyp_1_main_effects_model, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, but because the ln of a negative number is undefined we must first transform our continuous predictor variable to be positive by adding a constant. The code below adds a constant of 10 that makes all values for the variable positive before making the natural log transformation.
# Creating new variables that add a constant to the standardized variablesindividualsdata$c_z_SR_dom_cheng <- individualsdata$z_SR_dom_cheng +10individualsdata$c_z_SR_pres_cheng <- individualsdata$z_SR_pres_cheng +10# Creating the natural log transformations of the variablesindividualsdata$ln_c_z_SR_dom_cheng <-log(individualsdata$c_z_SR_dom_cheng)individualsdata$ln_c_z_SR_pres_cheng <-log(individualsdata$c_z_SR_pres_cheng)
Now we need to fit and summarize a model with the main effects of the continuous variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that the interaction term is not significant, thereby satisfying the assumption of linearity of the logit.
# Fitting the model for the Box-Tiddwell procedureBT_test_hyp_1_main_effects_model <-glm(sex_first_bio_child ~ c_z_SR_dom_cheng + c_z_SR_pres_cheng + c_z_SR_dom_cheng:ln_c_z_SR_dom_cheng + c_z_SR_pres_cheng:ln_c_z_SR_pres_cheng, family =binomial(link = logit), data = individualsdata)# Summarizing the modelsummary(BT_test_hyp_1_main_effects_model)
Call:
glm(formula = sex_first_bio_child ~ c_z_SR_dom_cheng + c_z_SR_pres_cheng +
c_z_SR_dom_cheng:ln_c_z_SR_dom_cheng + c_z_SR_pres_cheng:ln_c_z_SR_pres_cheng,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.6553 26.9976 -0.024 0.981
c_z_SR_dom_cheng 1.0101 6.4042 0.158 0.875
c_z_SR_pres_cheng -1.4908 6.2775 -0.237 0.812
c_z_SR_dom_cheng:ln_c_z_SR_dom_cheng -0.2459 1.9275 -0.128 0.898
c_z_SR_pres_cheng:ln_c_z_SR_pres_cheng 0.4990 1.9104 0.261 0.794
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 358.26 on 264 degrees of freedom
Residual deviance: 354.58 on 260 degrees of freedom
AIC: 364.58
Number of Fisher Scoring iterations: 4
Because neither of the interaction terms are significant (p = .898 and p = .794), the assumption of a linear relationship between continuous IVs and the logit of the DV is satisfied.
Summary of the Model
Now we will summarize the model, along with the Chi-square test for whether the model fits significantly better than the model with only the intercept.
# Producing the summary of the modelsummary(hyp_1_main_effects_model)
Call:
glm(formula = sex_first_bio_child ~ z_SR_dom_cheng + z_SR_pres_cheng,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.3796 0.1260 3.012 0.00259 **
z_SR_dom_cheng 0.1885 0.1289 1.462 0.14375
z_SR_pres_cheng 0.1476 0.1259 1.173 0.24078
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 358.26 on 264 degrees of freedom
Residual deviance: 354.66 on 262 degrees of freedom
AIC: 360.66
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_hyp_1_main_effects_model <- hyp_1_main_effects_model$null.deviance - hyp_1_main_effects_model$devianceChi_hyp_1_main_effects_model
[1] 3.595436
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_hyp_1_main_effects_model <- hyp_1_main_effects_model$df.null - hyp_1_main_effects_model$df.residualdf_hyp_1_main_effects_model
[1] 2
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_hyp_1_main_effects_model <-1-pchisq(Chi_hyp_1_main_effects_model, df_hyp_1_main_effects_model)prob_Chi_hyp_1_main_effects_model
[1] 0.1656766
Looking at the chi-square test, the entire model is not significant (χ²(2) = 3.595, p = .166). In addition, none of the other variables, except for the intercept, are significant. I will exponentiate the coefficients to reveal the odds-ratio of the coefficients.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(hyp_1_main_effects_model$coefficients)
# Fitting the model with self-reported dominance, sex, and their interactionhyp_1_dom_model <-glm(sex_first_bio_child ~ z_SR_dom_cheng + sex + z_SR_dom_cheng:sex, family =binomial(link = logit), data = individualsdata)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the modelplot(hyp_1_dom_model, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
Now we need to fit and summarize a model with the main effects of the continuous variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that the interaction term is not significant, thereby satisfying the assumption of linearity of the logit.
# Fitting the model for the Box-Tiddwell procedureBT_test_hyp_1_dom_model <-glm(sex_first_bio_child ~ c_z_SR_dom_cheng + sex + c_z_SR_dom_cheng:ln_c_z_SR_dom_cheng, family =binomial(link = logit), data = individualsdata)# Summarizing the modelsummary(BT_test_hyp_1_dom_model)
Call:
glm(formula = sex_first_bio_child ~ c_z_SR_dom_cheng + sex +
c_z_SR_dom_cheng:ln_c_z_SR_dom_cheng, family = binomial(link = logit),
data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.27409 19.43070 -0.117 0.907
c_z_SR_dom_cheng 0.45174 6.34746 0.071 0.943
sexfemale -0.03523 0.25528 -0.138 0.890
c_z_SR_dom_cheng:ln_c_z_SR_dom_cheng -0.08004 1.91027 -0.042 0.967
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 358.26 on 264 degrees of freedom
Residual deviance: 356.02 on 261 degrees of freedom
AIC: 364.02
Number of Fisher Scoring iterations: 4
Because the interaction term is not significant (p = .967), the assumption of a linear relationship between continuous IVs and the logit of the DV is satisfied.
Summary of the Model
Now we will summarize the model, along with the Chi-square test for whether the model fits significantly better than the model with only the intercept.
# Producing the summary of the modelsummary(hyp_1_dom_model)
Call:
glm(formula = sex_first_bio_child ~ z_SR_dom_cheng + sex + z_SR_dom_cheng:sex,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.42249 0.18638 2.267 0.0234 *
z_SR_dom_cheng 0.01806 0.18104 0.100 0.9205
sexfemale -0.02476 0.25789 -0.096 0.9235
z_SR_dom_cheng:sexfemale 0.34576 0.26435 1.308 0.1909
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 358.26 on 264 degrees of freedom
Residual deviance: 354.29 on 261 degrees of freedom
AIC: 362.29
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_hyp_1_dom_model <- hyp_1_dom_model$null.deviance - hyp_1_dom_model$devianceChi_hyp_1_dom_model
[1] 3.9667
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_hyp_1_dom_model <- hyp_1_dom_model$df.null - hyp_1_dom_model$df.residualdf_hyp_1_dom_model
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_hyp_1_dom_model <-1-pchisq(Chi_hyp_1_dom_model, df_hyp_1_dom_model)prob_Chi_hyp_1_dom_model
[1] 0.2650824
Looking at the chi-square test, the entire model is not significant (χ²(3) = 3.967, p = .265). In addition, none of the other variables, except for the intercept, are significant. I will exponentiate the coefficients to reveal the odds-ratio of the coefficients.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(hyp_1_dom_model$coefficients)
# Fitting the model with self-reported prestige, sex, and their interactionhyp_1_pres_model <-glm(sex_first_bio_child ~ z_SR_pres_cheng + sex + z_SR_pres_cheng:sex, family =binomial(link = logit), data = individualsdata)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the modelplot(hyp_1_pres_model, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
Now we need to fit and summarize a model with the main effects of the continuous variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that the interaction term is not significant, thereby satisfying the assumption of linearity of the logit.
# Fitting the model for the Box-Tiddwell procedureBT_test_hyp_1_pres_model <-glm(sex_first_bio_child ~ c_z_SR_pres_cheng + sex + c_z_SR_pres_cheng:ln_c_z_SR_pres_cheng, family =binomial(link = logit), data = individualsdata)# Summarizing the modelsummary(BT_test_hyp_1_pres_model)
Call:
glm(formula = sex_first_bio_child ~ c_z_SR_pres_cheng + sex +
c_z_SR_pres_cheng:ln_c_z_SR_pres_cheng, family = binomial(link = logit),
data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.643435 18.529248 -0.035 0.972
c_z_SR_pres_cheng 0.009025 6.217178 0.001 0.999
sexfemale -0.088859 0.252152 -0.352 0.725
c_z_SR_pres_cheng:ln_c_z_SR_pres_cheng 0.042329 1.892193 0.022 0.982
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 358.26 on 264 degrees of freedom
Residual deviance: 356.72 on 261 degrees of freedom
AIC: 364.72
Number of Fisher Scoring iterations: 4
Because the interaction term is not significant (p = .982), the assumption of a linear relationship between continuous IVs and the logit of the DV is satisfied.
Summary of the Model
Now we will summarize the model, along with the Chi-square test for whether the model fits significantly better than the model with only the intercept.
# Producing the summary of the modelsummary(hyp_1_pres_model)
Call:
glm(formula = sex_first_bio_child ~ z_SR_pres_cheng + sex + z_SR_pres_cheng:sex,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.42429 0.18380 2.308 0.021 *
z_SR_pres_cheng 0.06392 0.17658 0.362 0.717
sexfemale -0.08512 0.25180 -0.338 0.735
z_SR_pres_cheng:sexfemale 0.17117 0.25265 0.678 0.498
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 358.26 on 264 degrees of freedom
Residual deviance: 356.26 on 261 degrees of freedom
AIC: 364.26
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_hyp_1_pres_model <- hyp_1_pres_model$null.deviance - hyp_1_pres_model$devianceChi_hyp_1_pres_model
[1] 1.999807
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_hyp_1_pres_model <- hyp_1_pres_model$df.null - hyp_1_pres_model$df.residualdf_hyp_1_pres_model
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_hyp_1_pres_model <-1-pchisq(Chi_hyp_1_pres_model, df_hyp_1_pres_model)prob_Chi_hyp_1_pres_model
[1] 0.5724468
Looking at the chi-square test, the entire model is not significant (χ²(3) = 2, p = .572). In addition, none of the other variables, except for the intercept, are significant. I will exponentiate the coefficients to reveal the odds-ratio of the coefficients.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(hyp_1_pres_model$coefficients)
Overall, this data is inconsistent with Hypothesis 1. While one measure of social status (income) approached significance as a predictor of sex of first born son, dominance was not a significant predictor.
Study 1 Exploratory Analyses
After our planned analysis of data, the results revealed other potential avenues of fruitful analysis, and there were variables in the data set that were not analyzed due to other variables being more pertinent to tests of our specific Hypothesis 1. Nevertheless, these follow up analyses and inclusion of previously neglected variables may be relevant to testing the broader Trivers-Willard hypothesis (TWH) and maternal dominance hypothesis (MDH). Specifically, this analysis will include the following:
A correlation analysis to determine what potential covariates may be controlled for in previous statistical models (binomial logistic regressions including status, sex, and status*sex) to yield clearer results.
Inclusion of these covariate(s) (X) in relevant statistical models (mentioned in 1), to determine whether status predicts OSR (either directly or indirectly through sex as a moderator) after holding X constant.
A new binomial logistic regression model including partner-reported dominance and partner-reported prestige with offspring sex as the outcome variable.
New binomial logistic regression models (with the same status, sex, and status*sex format) including previously neglected measures of status or dominance (i.e., partner-reported measures of dominance and prestige, SAT, and IPIP dominance).
New binomial logistic regression models to test Hypothesis 1 with composite dominance and prestige based on self-and peer-reported dominance and prestige together.
Correlational Analysis to Determine Potential Control Variables
To determine whether there are possible covariates to add to bivariate logistic regression models predicting offspring sex, we will look for relationships among the predictor variables we have already assessed. These variables are already saved in a data frame called cont.variables which we used to generate descriptive statistics, so we will use that data frame to create the correlation matrix.
# Using the psych package to generate the correlation tablestudy_1_cor_matrix_covariates <-corr.test(cont.variables)# Displaying the resultsprint(study_1_cor_matrix_covariates)
In the correlation table above, there is one relationship that suggests controlling for this variable may yield clearer results from a previous analysis. Age is moderately and significantly correlated with income (r = .30, p < .01). Because age could potentially be a negative indicator of condition (with respect to the TWH), by controlling for age in our model using income, sex, and income*sex as predictors of offspring sex (Model 1: Income), we may be able to further isolate the main effect of income, which was the closest to being a significant predictor in prior analyses despite reduced power (p = .18; n = 212). I see no other correlations in this table that indicate that inclusion of other covariates would, theoretically, help to clarify our results.
Rerunning Income Model With Age As Covariate
First, we will fit our model.
# Fitting the model with age, income, sex, and income*sexincome_status_model_2 <-glm(sex_first_bio_child ~ age + z_income + sex + z_income:sex, family =binomial(link = logit), data = individualsdata)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the modelplot(income_status_model_2, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, but because we have already done this for income we only need to make the transformation for age. The code below takes the natural log of age for input into the Box-Tidwell model.
# Creating ln_age, which represents the natural log of the standardized income scores after adding a constant of 10individualsdata$ln_age <-log(individualsdata$age)
Now we need to fit and summarize a model with the main effects of the continuous variables with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that the interaction terms are not significant, thereby satisfying the assumption of linearity of the logit.
# Fitting the model for the Box-Tiddwell procedureBT_test_income_model_2 <-glm(sex_first_bio_child ~ age + c_z_income + sex + age:ln_age + c_z_income:ln_c_z_income, family =binomial(link = logit), data = individualsdata)# Summarizing the modelsummary(BT_test_income_model_2)
Call:
glm(formula = sex_first_bio_child ~ age + c_z_income + sex +
age:ln_age + c_z_income:ln_c_z_income, family = binomial(link = logit),
data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 36.35251 30.10919 1.207 0.227
age 0.22950 1.21323 0.189 0.850
c_z_income -13.11202 9.85001 -1.331 0.183
sexfemale -0.03527 0.30111 -0.117 0.907
age:ln_age -0.04955 0.26697 -0.186 0.853
c_z_income:ln_c_z_income 4.04351 2.96578 1.363 0.173
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 289.05 on 211 degrees of freedom
Residual deviance: 280.51 on 206 degrees of freedom
(53 observations deleted due to missingness)
AIC: 292.51
Number of Fisher Scoring iterations: 5
Because neither interaction term is significant (p = .853 and p = .173), the assumption of a linear relationship between continuous IVs and the logit of the DV is satisfied.
Summary of the Model
Now we will summarize the model, along with the Chi-square test for whether the model fits significantly better than the model with only the intercept.
# Producing the summary of the modelsummary(income_status_model_2)
Call:
glm(formula = sex_first_bio_child ~ age + z_income + sex + z_income:sex,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.416495 1.031494 0.404 0.686
age -0.002445 0.028986 -0.084 0.933
z_income 0.277400 0.224029 1.238 0.216
sexfemale 0.021439 0.302231 0.071 0.943
z_income:sexfemale 0.206868 0.332478 0.622 0.534
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 289.05 on 211 degrees of freedom
Residual deviance: 282.50 on 207 degrees of freedom
(53 observations deleted due to missingness)
AIC: 292.5
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_income_model_2 <- income_status_model_2$null.deviance - income_status_model_2$devianceChi_income_model_2
[1] 6.548835
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_income_model_2 <- income_status_model_2$df.null - income_status_model_2$df.residualdf_income_model_2
[1] 4
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_income_model_2 <-1-pchisq(Chi_income_model_2, df_income_model_2)prob_Chi_income_model_2
[1] 0.1617392
Looking at the chi-square test, the model is marginally significant (χ²(4) = 6.549, p = .162). However, none of the other variables, including income, are significant. I will exponentiate the coefficients to reveal the odds-ratio of the coefficients.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(income_status_model_2$coefficients)
(Intercept) age z_income sexfemale
1.5166370 0.9975582 1.3196945 1.0216700
z_income:sexfemale
1.2298200
Rerunning Dominance and Prestige Model with Partner-Reported Measures
Before running the dominance and prestige model again with partner-reported measures instead of individual-reported measures, we need to transform our variables to standardize them.
# Standardizing the two predictorsindividualsdata$z_PR_dom_cheng <-scale(individualsdata$PR_dom_cheng)individualsdata$z_PR_pres_cheng <-scale(individualsdata$PR_pres_cheng)
Now, we will fit our model.
# Fitting the model with self-reported dominance and prestigehyp_1_main_effects_model_PR <-glm(sex_first_bio_child ~ z_PR_dom_cheng + z_PR_pres_cheng, family =binomial(link = logit), data = individualsdata)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the modelplot(hyp_1_main_effects_model_PR, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, but because the ln of a negative number is undefined we must first transform our continuous predictor variable to be positive by adding a constant. The code below adds a constant of 10 that makes all values for the variable positive before making the natural log transformation.
# Creating new variables that add a constant to the standardized variablesindividualsdata$c_z_PR_dom_cheng <- individualsdata$z_PR_dom_cheng +10individualsdata$c_z_PR_pres_cheng <- individualsdata$z_PR_pres_cheng +10# Creating the natural log transformations of the variablesindividualsdata$ln_c_z_PR_dom_cheng <-log(individualsdata$c_z_PR_dom_cheng)individualsdata$ln_c_z_PR_pres_cheng <-log(individualsdata$c_z_PR_pres_cheng)
Now we need to fit and summarize a model with the main effects of the continuous variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that the interaction term is not significant, thereby satisfying the assumption of linearity of the logit.
# Fitting the model for the Box-Tiddwell procedureBT_test_hyp_1_main_effects_model_PR <-glm(sex_first_bio_child ~ c_z_PR_dom_cheng + c_z_PR_pres_cheng + c_z_PR_dom_cheng:ln_c_z_PR_dom_cheng + c_z_PR_pres_cheng:ln_c_z_PR_pres_cheng, family =binomial(link = logit), data = individualsdata)# Summarizing the modelsummary(BT_test_hyp_1_main_effects_model_PR)
Call:
glm(formula = sex_first_bio_child ~ c_z_PR_dom_cheng + c_z_PR_pres_cheng +
c_z_PR_dom_cheng:ln_c_z_PR_dom_cheng + c_z_PR_pres_cheng:ln_c_z_PR_pres_cheng,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -36.260 32.088 -1.130 0.258
c_z_PR_dom_cheng 3.732 7.996 0.467 0.641
c_z_PR_pres_cheng 7.675 7.583 1.012 0.312
c_z_PR_dom_cheng:ln_c_z_PR_dom_cheng -1.065 2.409 -0.442 0.658
c_z_PR_pres_cheng:ln_c_z_PR_pres_cheng -2.289 2.312 -0.990 0.322
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 309.59 on 229 degrees of freedom
Residual deviance: 304.95 on 225 degrees of freedom
(35 observations deleted due to missingness)
AIC: 314.95
Number of Fisher Scoring iterations: 4
Because neither of the interaction terms are significant (p = .658 and p = .322), the assumption of a linear relationship between continuous IVs and the logit of the DV is satisfied.
Summary of the Model
Now we will summarize the model, along with the Chi-square test for whether the model fits significantly better than the model with only the intercept.
# Producing the summary of the modelsummary(hyp_1_main_effects_model_PR)
Call:
glm(formula = sex_first_bio_child ~ z_PR_dom_cheng + z_PR_pres_cheng,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.4116 0.1358 3.032 0.00243 **
z_PR_dom_cheng 0.2300 0.1425 1.615 0.10638
z_PR_pres_cheng 0.1687 0.1395 1.209 0.22651
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 309.59 on 229 degrees of freedom
Residual deviance: 306.19 on 227 degrees of freedom
(35 observations deleted due to missingness)
AIC: 312.19
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_hyp_1_main_effects_model_PR <- hyp_1_main_effects_model_PR$null.deviance - hyp_1_main_effects_model_PR$devianceChi_hyp_1_main_effects_model_PR
[1] 3.393547
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_hyp_1_main_effects_model_PR <- hyp_1_main_effects_model_PR$df.null - hyp_1_main_effects_model_PR$df.residualdf_hyp_1_main_effects_model_PR
[1] 2
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_hyp_1_main_effects_model_PR <-1-pchisq(Chi_hyp_1_main_effects_model_PR, df_hyp_1_main_effects_model_PR)prob_Chi_hyp_1_main_effects_model_PR
[1] 0.1832739
Looking at the chi-square test, the entire model is not significant (χ²(2) = 3.394, p = .183). In addition, none of the other variables, except for the intercept, are significant. I will exponentiate the coefficients to reveal the odds-ratio of the coefficients.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(hyp_1_main_effects_model_PR$coefficients)
Rerunning Status Models with Previously Neglected Measures of Status or Dominance
Now we will run new binomial logistic regression models (with the same status, sex, and status*sex format) while including previously neglected measures of status or dominance (i.e., partner-reported measures of dominance and prestige, the SAT, and the IPIP dominance).
Partner-Reported Dominance, Sex, and their Interaction
First, we will fit our model.
# Fitting the model with self-reported dominance, sex, and their interactionhyp_1_dom_model_PR <-glm(sex_first_bio_child ~ z_PR_dom_cheng + sex + z_PR_dom_cheng:sex, family =binomial(link = logit), data = individualsdata)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the modelplot(hyp_1_dom_model_PR, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
Now we need to fit and summarize a model with the main effects of the continuous variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that the interaction term is not significant, thereby satisfying the assumption of linearity of the logit.
# Fitting the model for the Box-Tiddwell procedureBT_test_hyp_1_dom_model_PR <-glm(sex_first_bio_child ~ c_z_PR_dom_cheng + sex + c_z_PR_dom_cheng:ln_c_z_PR_dom_cheng, family =binomial(link = logit), data = individualsdata)# Summarizing the modelsummary(BT_test_hyp_1_dom_model_PR)
Call:
glm(formula = sex_first_bio_child ~ c_z_PR_dom_cheng + sex +
c_z_PR_dom_cheng:ln_c_z_PR_dom_cheng, family = binomial(link = logit),
data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -11.9067 24.1208 -0.494 0.622
c_z_PR_dom_cheng 3.6315 7.9103 0.459 0.646
sexfemale -0.1010 0.2708 -0.373 0.709
c_z_PR_dom_cheng:ln_c_z_PR_dom_cheng -1.0378 2.3840 -0.435 0.663
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 309.59 on 229 degrees of freedom
Residual deviance: 307.33 on 226 degrees of freedom
(35 observations deleted due to missingness)
AIC: 315.33
Number of Fisher Scoring iterations: 4
Because the interaction term is not significant (p = .663), the assumption of a linear relationship between continuous IVs and the logit of the DV is satisfied.
Summary of the Model
Now we will summarize the model, along with the Chi-square test for whether the model fits significantly better than the model with only the intercept.
# Producing the summary of the modelsummary(hyp_1_dom_model_PR)
Call:
glm(formula = sex_first_bio_child ~ z_PR_dom_cheng + sex + z_PR_dom_cheng:sex,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.45847 0.19488 2.353 0.0186 *
z_PR_dom_cheng -0.03676 0.19625 -0.187 0.8514
sexfemale -0.08571 0.27247 -0.315 0.7531
z_PR_dom_cheng:sexfemale 0.44030 0.27840 1.582 0.1138
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 309.59 on 229 degrees of freedom
Residual deviance: 304.99 on 226 degrees of freedom
(35 observations deleted due to missingness)
AIC: 312.99
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_hyp_1_dom_model_PR <- hyp_1_dom_model_PR$null.deviance - hyp_1_dom_model_PR$devianceChi_hyp_1_dom_model_PR
[1] 4.59687
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_hyp_1_dom_model_PR <- hyp_1_dom_model_PR$df.null - hyp_1_dom_model_PR$df.residualdf_hyp_1_dom_model_PR
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_hyp_1_dom_model_PR <-1-pchisq(Chi_hyp_1_dom_model_PR, df_hyp_1_dom_model_PR)prob_Chi_hyp_1_dom_model_PR
[1] 0.2038108
Looking at the chi-square test, the entire model is not significant (χ²(3) = 4.597, p = .204). In addition, none of the other variables, except for the intercept, are significant, although the interaction between partner-reported dominance and sex approaches significance. I will exponentiate the coefficients to reveal the odds-ratio of the coefficients.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(hyp_1_dom_model_PR$coefficients)
Although the odds ratio for the interaction between partner-reported dominance and sex suggests a strong effect size, this term is still not significant (p = .085).
Partner-Reported Prestige, Sex, and their Interaction
First, we will fit our model.
# Fitting the model with self-reported prestige, sex, and their interactionhyp_1_pres_model_PR <-glm(sex_first_bio_child ~ z_PR_pres_cheng + sex + z_PR_pres_cheng:sex, family =binomial(link = logit), data = individualsdata)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the modelplot(hyp_1_pres_model_PR, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
Now we need to fit and summarize a model with the main effects of the continuous variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that the interaction term is not significant, thereby satisfying the assumption of linearity of the logit.
# Fitting the model for the Box-Tiddwell procedureBT_test_hyp_1_pres_model_PR <-glm(sex_first_bio_child ~ c_z_PR_pres_cheng + sex + c_z_PR_pres_cheng:ln_c_z_PR_pres_cheng, family =binomial(link = logit), data = individualsdata)# Summarizing the modelsummary(BT_test_hyp_1_pres_model_PR)
Call:
glm(formula = sex_first_bio_child ~ c_z_PR_pres_cheng + sex +
c_z_PR_pres_cheng:ln_c_z_PR_pres_cheng, family = binomial(link = logit),
data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -30.09061 22.07674 -1.363 0.173
c_z_PR_pres_cheng 10.01869 7.41209 1.352 0.176
sexfemale -0.04448 0.27393 -0.162 0.871
c_z_PR_pres_cheng:ln_c_z_PR_pres_cheng -3.01890 2.25705 -1.338 0.181
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 309.59 on 229 degrees of freedom
Residual deviance: 306.96 on 226 degrees of freedom
(35 observations deleted due to missingness)
AIC: 314.96
Number of Fisher Scoring iterations: 4
Because the interaction term is not significant (p = .181), the assumption of a linear relationship between continuous IVs and the logit of the DV is satisfied.
Summary of the Model
Now we will summarize the model, along with the Chi-square test for whether the model fits significantly better than the model with only the intercept.
# Producing the summary of the modelsummary(hyp_1_pres_model_PR)
Call:
glm(formula = sex_first_bio_child ~ z_PR_pres_cheng + sex + z_PR_pres_cheng:sex,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.46977 0.19746 2.379 0.0174 *
z_PR_pres_cheng -0.08243 0.21256 -0.388 0.6982
sexfemale -0.07759 0.27407 -0.283 0.7771
z_PR_pres_cheng:sexfemale 0.32853 0.27808 1.181 0.2374
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 309.59 on 229 degrees of freedom
Residual deviance: 307.38 on 226 degrees of freedom
(35 observations deleted due to missingness)
AIC: 315.38
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_hyp_1_pres_model_PR <- hyp_1_pres_model_PR$null.deviance - hyp_1_pres_model_PR$devianceChi_hyp_1_pres_model_PR
[1] 2.204522
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_hyp_1_pres_model_PR <- hyp_1_pres_model_PR$df.null - hyp_1_pres_model_PR$df.residualdf_hyp_1_pres_model_PR
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_hyp_1_pres_model_PR <-1-pchisq(Chi_hyp_1_pres_model_PR, df_hyp_1_pres_model_PR)prob_Chi_hyp_1_pres_model_PR
[1] 0.5310582
Looking at the chi-square test, the entire model is not significant (χ²(3) = 2.205, p = .531). In addition, none of the other variables, except for the intercept, are significant. I will exponentiate the coefficients to reveal the odds-ratio of the coefficients.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(hyp_1_pres_model_PR$coefficients)
Now we will look for Trivers-Willard effects with the SAT as our measure of dominance. First, we need to standardize the SAT variable.
# Creating standardized scores for SATindividualsdata$z_SAT <-scale(individualsdata$SAT)
Now, we will fit our model.
# Fitting the model with standardized SAT, sex, and their interactionSAT_status_model <-glm(sex_first_bio_child ~ z_SAT + sex + z_SAT:sex, family =binomial(link = logit), data = individualsdata)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the modelplot(SAT_status_model, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, but because the ln of a negative number is undefined we must first transform our continuous predictor variable to be positive by adding a constant. The code below adds a constant of 10 that makes all values for the variable positive before making the natural log transformation.
# Creating c_z_SAT, which represents standardized SAT after adding a constant of 10individualsdata$c_z_SAT <- individualsdata$z_SAT +10# Creating ln_c_z_SAT, which represents the natural log of the standardized SAT scores after adding a constant of 10individualsdata$ln_c_z_SAT <-log(individualsdata$c_z_SAT)
Now we need to fit and summarize a model with the main effects of the continuous variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that the interaction term is not significant, thereby satisfying the assumption of linearity of the logit.
# Fitting the model for the Box-Tiddwell procedureBT_test_SAT_status_model <-glm(sex_first_bio_child ~ c_z_SAT + sex + c_z_SAT:ln_c_z_SAT, family =binomial(link = logit), data = individualsdata)# Summarizing the modelsummary(BT_test_SAT_status_model)
Call:
glm(formula = sex_first_bio_child ~ c_z_SAT + sex + c_z_SAT:ln_c_z_SAT,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 26.39562 22.98298 1.148 0.251
c_z_SAT -8.36190 7.39522 -1.131 0.258
sexfemale -0.09818 0.25188 -0.390 0.697
c_z_SAT:ln_c_z_SAT 2.49858 2.21030 1.130 0.258
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 358.26 on 264 degrees of freedom
Residual deviance: 356.76 on 261 degrees of freedom
AIC: 364.76
Number of Fisher Scoring iterations: 4
Because the interaction term is not significant (p = .258), the assumption of a linear relationship between continuous IVs and the logit of the DV is satisfied.
Summary of the Model
Now we will summarize the model, along with the Chi-square test for whether the model fits significantly better than the model with only the intercept.
# Producing the summary of the modelsummary(SAT_status_model)
Call:
glm(formula = sex_first_bio_child ~ z_SAT + sex + z_SAT:sex,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.42293 0.18398 2.299 0.0215 *
z_SAT 0.04748 0.18091 0.262 0.7930
sexfemale -0.09580 0.25116 -0.381 0.7029
z_SAT:sexfemale -0.08664 0.25150 -0.345 0.7305
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 358.26 on 264 degrees of freedom
Residual deviance: 357.99 on 261 degrees of freedom
AIC: 365.99
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_SAT_status_model <- SAT_status_model$null.deviance - SAT_status_model$devianceChi_SAT_status_model
[1] 0.2674562
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_SAT_status_model <- SAT_status_model$df.null - SAT_status_model$df.residualdf_SAT_status_model
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_SAT_status_model <-1-pchisq(Chi_SAT_status_model, df_SAT_status_model)prob_Chi_SAT_status_model
[1] 0.9660282
Looking at the chi-square test, the entire model is not significant (χ²(3) = .267, p = .966). In addition, none of the other variables, except for the intercept, are significant. I will exponentiate the coefficients to reveal the odds-ratio of the coefficients.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(SAT_status_model$coefficients)
Now we will look for Trivers-Willard effects with the IPIP_dom as our measure of dominance. First, we need to standardize the IPIP_dom variable.
# Creating standardized scores for IPIP_domindividualsdata$z_IPIP <-scale(individualsdata$IPIP_dom)
Now, we will fit our model.
# Fitting the model with standardized IPIP, sex, and their interactionIPIP_status_model <-glm(sex_first_bio_child ~ z_IPIP + sex + z_IPIP:sex, family =binomial(link = logit), data = individualsdata)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the modelplot(IPIP_status_model, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, but because the ln of a negative number is undefined we must first transform our continuous predictor variable to be positive by adding a constant. The code below adds a constant of 10 that makes all values for the variable positive before making the natural log transformation.
# Creating c_z_IPIP, which represents standardized IPIP after adding a constant of 10individualsdata$c_z_IPIP <- individualsdata$z_IPIP +10# Creating ln_c_z_IPIP, which represents the natural log of the standardized IPIP scores after adding a constant of 10individualsdata$ln_c_z_IPIP <-log(individualsdata$c_z_IPIP)
Now we need to fit and summarize a model with the main effects of the continuous variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that the interaction term is not significant, thereby satisfying the assumption of linearity of the logit.
# Fitting the model for the Box-Tiddwell procedureBT_test_IPIP_status_model <-glm(sex_first_bio_child ~ c_z_IPIP + sex + c_z_IPIP:ln_c_z_IPIP, family =binomial(link = logit), data = individualsdata)# Summarizing the modelsummary(BT_test_IPIP_status_model)
Call:
glm(formula = sex_first_bio_child ~ c_z_IPIP + sex + c_z_IPIP:ln_c_z_IPIP,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 32.0576 21.8271 1.469 0.142
c_z_IPIP -10.4439 7.2274 -1.445 0.148
sexfemale -0.1106 0.2604 -0.425 0.671
c_z_IPIP:ln_c_z_IPIP 3.1557 2.1878 1.442 0.149
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 358.26 on 264 degrees of freedom
Residual deviance: 355.92 on 261 degrees of freedom
AIC: 363.92
Number of Fisher Scoring iterations: 4
Because the interaction term is not significant (p = .149), the assumption of a linear relationship between continuous IVs and the logit of the DV is satisfied.
Summary of the Model
Now we will summarize the model, along with the Chi-square test for whether the model fits significantly better than the model with only the intercept.
# Producing the summary of the modelsummary(IPIP_status_model)
Call:
glm(formula = sex_first_bio_child ~ z_IPIP + sex + z_IPIP:sex,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.4738 0.1936 2.447 0.0144 *
z_IPIP -0.1633 0.1920 -0.850 0.3951
sexfemale -0.1183 0.2625 -0.451 0.6522
z_IPIP:sexfemale 0.2672 0.2620 1.020 0.3079
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 358.26 on 264 degrees of freedom
Residual deviance: 357.04 on 261 degrees of freedom
AIC: 365.04
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_IPIP_status_model <- IPIP_status_model$null.deviance - IPIP_status_model$devianceChi_IPIP_status_model
[1] 1.217753
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_IPIP_status_model <- IPIP_status_model$df.null - IPIP_status_model$df.residualdf_IPIP_status_model
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_IPIP_status_model <-1-pchisq(Chi_IPIP_status_model, df_IPIP_status_model)prob_Chi_IPIP_status_model
[1] 0.7487497
Looking at the chi-square test, the entire model is not significant (χ²(3) = 1.21, p = .749). In addition, none of the other variables, except for the intercept, are significant. I will exponentiate the coefficients to reveal the odds-ratio of the coefficients.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(IPIP_status_model$coefficients)
Rerun Models Testing Hypothesis 1 with Composite Dominance and Prestige
Originally, we planned to only make composite measures of dominance and prestige if the self- and partner-reported measures correlated at r = .65 or higher (Assessing Whether to Create Behavioral Dominance Composite). However, this may have been a more stringent and arbitrary threshold than necessary. Here, we will rerun the models from Testing Hypothesis 1 with composite versions of dominance and prestige.
Creating the Composite Variables
First, we need to create the composite variables. To do this, we will simply take the mean of self- and partner-reported dominance and prestige, respectively.
# Taking the mean of self- and partner-reported measures as the composite measuresindividualsdata$comp_dom <- (individualsdata$SR_dom_cheng + individualsdata$PR_dom_cheng)/2individualsdata$comp_pres <- (individualsdata$SR_pres_cheng + individualsdata$PR_pres_cheng)/2
Next, we will standardize both of the variables.
# Standardizing the composite variablesindividualsdata$z_comp_dom <-scale(individualsdata$comp_dom)individualsdata$z_comp_pres <-scale(individualsdata$comp_pres)
Rerunning the Models
Model 1: Dominance and Prestige Main Effects Model
First, we will fit our model.
# Fitting the model with composite dominance and prestigehyp_1_main_effects_model_comp <-glm(sex_first_bio_child ~ z_comp_dom + z_comp_pres, family =binomial(link = logit), data = individualsdata)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the modelplot(hyp_1_main_effects_model_comp, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, but because the ln of a negative number is undefined we must first transform our continuous predictor variable to be positive by adding a constant. The code below adds a constant of 10 that makes all values for the variable positive before making the natural log transformation.
# Creating new variables that add a constant to the standardized variablesindividualsdata$c_z_comp_dom <- individualsdata$z_comp_dom +10individualsdata$c_z_comp_pres <- individualsdata$z_comp_pres +10# Creating the natural log transformations of the variablesindividualsdata$ln_c_z_comp_dom <-log(individualsdata$c_z_comp_dom)individualsdata$ln_c_z_comp_pres <-log(individualsdata$c_z_comp_pres)
Now we need to fit and summarize a model with the main effects of the continuous variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that the interaction term is not significant, thereby satisfying the assumption of linearity of the logit.
# Fitting the model for the Box-Tiddwell procedureBT_test_hyp_1_main_effects_model_comp <-glm(sex_first_bio_child ~ c_z_comp_dom + c_z_comp_pres + c_z_comp_dom:ln_c_z_comp_dom + c_z_comp_pres:ln_c_z_comp_pres, family =binomial(link = logit), data = individualsdata)# Summarizing the modelsummary(BT_test_hyp_1_main_effects_model_comp)
Call:
glm(formula = sex_first_bio_child ~ c_z_comp_dom + c_z_comp_pres +
c_z_comp_dom:ln_c_z_comp_dom + c_z_comp_pres:ln_c_z_comp_pres,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -10.14581 31.78512 -0.319 0.750
c_z_comp_dom 2.04702 7.09859 0.288 0.773
c_z_comp_pres 0.51831 7.67933 0.067 0.946
c_z_comp_dom:ln_c_z_comp_dom -0.56063 2.13394 -0.263 0.793
c_z_comp_pres:ln_c_z_comp_pres -0.09356 2.33369 -0.040 0.968
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 309.59 on 229 degrees of freedom
Residual deviance: 305.74 on 225 degrees of freedom
(35 observations deleted due to missingness)
AIC: 315.74
Number of Fisher Scoring iterations: 4
Because neither of the interaction terms are significant (p = .793 and p = .968), the assumption of a linear relationship between continuous IVs and the logit of the DV is satisfied.
Summary of the Model
Now we will summarize the model, along with the Chi-square test for whether the model fits significantly better than the model with only the intercept.
# Producing the summary of the modelsummary(hyp_1_main_effects_model_comp)
Call:
glm(formula = sex_first_bio_child ~ z_comp_dom + z_comp_pres,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.4124 0.1359 3.035 0.00241 **
z_comp_dom 0.1837 0.1396 1.316 0.18817
z_comp_pres 0.2068 0.1367 1.513 0.13018
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 309.59 on 229 degrees of freedom
Residual deviance: 305.81 on 227 degrees of freedom
(35 observations deleted due to missingness)
AIC: 311.81
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_hyp_1_main_effects_model_comp <- hyp_1_main_effects_model_comp$null.deviance - hyp_1_main_effects_model_comp$devianceChi_hyp_1_main_effects_model_comp
[1] 3.775693
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_hyp_1_main_effects_model_comp <- hyp_1_main_effects_model_comp$df.null - hyp_1_main_effects_model_comp$df.residualdf_hyp_1_main_effects_model_comp
[1] 2
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_hyp_1_main_effects_model_comp <-1-pchisq(Chi_hyp_1_main_effects_model_comp, df_hyp_1_main_effects_model_comp)prob_Chi_hyp_1_main_effects_model_comp
[1] 0.1513975
Looking at the chi-square test, the entire model is not significant (χ²(2) = 3.776, p = .151). In addition, none of the other variables, except for the intercept, are significant. I will exponentiate the coefficients to reveal the odds-ratio of the coefficients.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(hyp_1_main_effects_model_comp$coefficients)
In contrast to our hypothesis, the odds-ratio for prestige is higher than the odds-ratio for dominance, although neither are significant predictors.
Model 2: Dominance, Sex, and their Interaction
First, we will fit our model.
# Fitting the model with composite dominance, sex, and their interactionhyp_1_dom_model_comp <-glm(sex_first_bio_child ~ z_comp_dom + sex + z_comp_dom:sex, family =binomial(link = logit), data = individualsdata)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the modelplot(hyp_1_dom_model_comp, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
Now we need to fit and summarize a model with the main effects of the continuous variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that the interaction term is not significant, thereby satisfying the assumption of linearity of the logit.
# Fitting the model for the Box-Tiddwell procedureBT_test_hyp_1_dom_model_comp <-glm(sex_first_bio_child ~ c_z_comp_dom + sex + c_z_comp_dom:ln_c_z_comp_dom, family =binomial(link = logit), data = individualsdata)# Summarizing the modelsummary(BT_test_hyp_1_dom_model_comp)
Call:
glm(formula = sex_first_bio_child ~ c_z_comp_dom + sex + c_z_comp_dom:ln_c_z_comp_dom,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.70595 21.52091 -0.172 0.863
c_z_comp_dom 0.99028 7.01846 0.141 0.888
sexfemale -0.07188 0.27167 -0.265 0.791
c_z_comp_dom:ln_c_z_comp_dom -0.24924 2.11020 -0.118 0.906
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 309.59 on 229 degrees of freedom
Residual deviance: 308.04 on 226 degrees of freedom
(35 observations deleted due to missingness)
AIC: 316.04
Number of Fisher Scoring iterations: 4
Because the interaction term is not significant (p = .906), the assumption of a linear relationship between continuous IVs and the logit of the DV is satisfied.
Summary of the Model
Now we will summarize the model, along with the Chi-square test for whether the model fits significantly better than the model with only the intercept.
# Producing the summary of the modelsummary(hyp_1_dom_model_comp)
Call:
glm(formula = sex_first_bio_child ~ z_comp_dom + sex + z_comp_dom:sex,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.46374 0.19606 2.365 0.018 *
z_comp_dom -0.05059 0.18774 -0.269 0.788
sexfemale -0.05120 0.27525 -0.186 0.852
z_comp_dom:sexfemale 0.46692 0.28549 1.636 0.102
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 309.59 on 229 degrees of freedom
Residual deviance: 305.31 on 226 degrees of freedom
(35 observations deleted due to missingness)
AIC: 313.31
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_hyp_1_dom_model_comp <- hyp_1_dom_model_comp$null.deviance - hyp_1_dom_model_comp$devianceChi_hyp_1_dom_model_comp
[1] 4.275591
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_hyp_1_dom_model_comp <- hyp_1_dom_model_comp$df.null - hyp_1_dom_model_comp$df.residualdf_hyp_1_dom_model_comp
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_hyp_1_dom_model_comp <-1-pchisq(Chi_hyp_1_dom_model_comp, df_hyp_1_dom_model_comp)prob_Chi_hyp_1_dom_model_comp
[1] 0.233202
Looking at the chi-square test, the entire model is not significant (χ²(3) = 4.276, p = .102). In addition, none of the other variables, except for the intercept, are significant, although the interaction with sex is marginally significant. I will exponentiate the coefficients to reveal the odds-ratio of the coefficients.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(hyp_1_dom_model_comp$coefficients)
Although the odds-ratio for the interaction term is a pretty large effect size, it is still not a significant predictor.
Model 3: Prestige, Sex, and their Interaction
First, we will fit our model.
# Fitting the model with self-reported prestige, sex, and their interactionhyp_1_pres_model_comp <-glm(sex_first_bio_child ~ z_comp_pres + sex + z_comp_pres:sex, family =binomial(link = logit), data = individualsdata)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the modelplot(hyp_1_pres_model_comp, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
Now we need to fit and summarize a model with the main effects of the continuous variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that the interaction term is not significant, thereby satisfying the assumption of linearity of the logit.
# Fitting the model for the Box-Tiddwell procedureBT_test_hyp_1_pres_model_comp <-glm(sex_first_bio_child ~ c_z_comp_pres + sex + c_z_comp_pres:ln_c_z_comp_pres, family =binomial(link = logit), data = individualsdata)# Summarizing the modelsummary(BT_test_hyp_1_pres_model_comp)
Call:
glm(formula = sex_first_bio_child ~ c_z_comp_pres + sex + c_z_comp_pres:ln_c_z_comp_pres,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.31548 22.54720 -0.324 0.746
c_z_comp_pres 2.15340 7.53004 0.286 0.775
sexfemale -0.06347 0.27207 -0.233 0.816
c_z_comp_pres:ln_c_z_comp_pres -0.59702 2.28756 -0.261 0.794
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 309.59 on 229 degrees of freedom
Residual deviance: 307.45 on 226 degrees of freedom
(35 observations deleted due to missingness)
AIC: 315.45
Number of Fisher Scoring iterations: 4
Because the interaction term is not significant (p = .794), the assumption of a linear relationship between continuous IVs and the logit of the DV is satisfied.
Summary of the Model
Now we will summarize the model, along with the Chi-square test for whether the model fits significantly better than the model with only the intercept.
# Producing the summary of the modelsummary(hyp_1_pres_model_comp)
Call:
glm(formula = sex_first_bio_child ~ z_comp_pres + sex + z_comp_pres:sex,
family = binomial(link = logit), data = individualsdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.45309 0.19573 2.315 0.0206 *
z_comp_pres 0.05463 0.20021 0.273 0.7850
sexfemale -0.05960 0.27304 -0.218 0.8272
z_comp_pres:sexfemale 0.25000 0.27487 0.910 0.3631
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 309.59 on 229 degrees of freedom
Residual deviance: 306.69 on 226 degrees of freedom
(35 observations deleted due to missingness)
AIC: 314.69
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_hyp_1_pres_model_comp <- hyp_1_pres_model_comp$null.deviance - hyp_1_pres_model_comp$devianceChi_hyp_1_pres_model_comp
[1] 2.893484
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_hyp_1_pres_model_comp <- hyp_1_pres_model_comp$df.null - hyp_1_pres_model_comp$df.residualdf_hyp_1_pres_model_comp
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_hyp_1_pres_model_comp <-1-pchisq(Chi_hyp_1_pres_model_comp, df_hyp_1_pres_model_comp)prob_Chi_hyp_1_pres_model_comp
[1] 0.4083411
Looking at the chi-square test, the entire model is not significant (χ²(3) = 2.893, p = .408). In addition, none of the other variables, except for the intercept, are significant. I will exponentiate the coefficients to reveal the odds-ratio of the coefficients.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(hyp_1_pres_model_comp$coefficients)
Like the models that used only self-reported dominance and prestige, these models do not provide support for Hypothesis 1.
Clean-Up From Study 1 Analysis
Before going on to Study 2, we will remove unnecessary objects in the environment, write the individualsdata data frame to the hard drive as data_after_study_1_analysis.csv, and delete unnecessary variables we have created from the individualsdata data frame.
# Creating a vector containing the names of all objects in the environmentobjects <-ls()# Removing all objects except for the "individualsdata" data framerm(list = objects[objects !="individualsdata"])rm("objects")# Writing the individualsdata data frame to the data folderwrite.csv(individualsdata, file ="./data/data_after_study_1_analysis.csv")# Remove the variables we have created from the individualsdata data frameindividualsdata <-subset(individualsdata, select = ID:photo_method)
Study 2 Analysis
Reliability of Facial Ratings and Facial Width-to-Height Ratio
Data Cleaning for Ratings
Before we can proceed with analyses, we must calculate the average facial ratings for dominance, masculinity/femininity, and attractiveness, but, first, we will check the reliability of the facial ratings. First, let’s read in the data for the facial ratings and take a look at it.
# Reading in the data as the data frame ratingsdataratingsdata <-read_excel("./data/Clean_Ratings_Data.xlsx", sheet =1)
The raters are in rows, and the facial images identifiers are in the columns (after the demographic data). Also, the dominance ratings, masculinity/femininity ratings, and attractiveness ratings are all in the same data frame here. To allow for easier reliability analysis, I will clean this up a bit. In the following chunks, I will create a data frame that isolates each of the different types of ratings with the ID as the row identifier. I will save the resultant data frames as CSV files in the data folder.
For the dominance ratings:
# Extract the relevant columns for dominance ratings using grep and transpose the data as transposed_dominance_datatransposed_dominance_data <-as.data.frame(t( ratingsdata[, grep("dominance", names(ratingsdata))] ))# Remove extraneous features of the row names of the transposed_dominance_data data frame to only include the IDrow.names(transposed_dominance_data) <-gsub("dominance_[0-9]+_[0-9]+_", "", row.names(transposed_dominance_data))# Write the transposed data to a CSV file called transposed_dominance_data.csv with empty strings for missing valueswrite.csv(transposed_dominance_data, "./data/transposed_dominance_data.csv", row.names =TRUE, na ="")
For the masculinity/femininity ratings:
# Extract the relevant columns for masculinity/femininity ratings using grep and transpose the data as transposed_masculinityfemininity_data (note that I accidently spelled "femininity" as "feminity" in the Python script that cleaned up this dataset before importing the ratings data, which is why the grep function has "masculinity/feminity" in it here)transposed_masculinityfemininity_data <-as.data.frame(t( ratingsdata[, grep("masculinity/feminity", names(ratingsdata))] ))# Remove extraneous features of the row names of the transposed_masculinityfemininity_data data frame to only include the ID (note that I accidently spelled "femininity" as "feminity" in the Python script that cleaned up this dataset before importing the ratings data, which is why the gsub function has "masculinity/feminity" in it here)row.names(transposed_masculinityfemininity_data) <-gsub("masculinity/feminity_[0-9]+_[0-9]+_", "", row.names(transposed_masculinityfemininity_data))# Write the transposed data to a CSV file called transposed_masculinityfemininity_data.csv with empty strings for missing valueswrite.csv(transposed_masculinityfemininity_data, "./data/transposed_masculinityfemininity_data.csv", row.names =TRUE, na ="")
For the attractiveness ratings:
# Extract the relevant columns for attractiveness ratings using grep and transpose the data as transposed_attractiveness_datatransposed_attractiveness_data <-as.data.frame(t( ratingsdata[, grep("attractiveness", names(ratingsdata))] ))# Remove extraneous features of the row names of the transposed_attractiveness_data data frame to only include the IDrow.names(transposed_attractiveness_data) <-gsub("attractiveness_[0-9]+_[0-9]+_", "", row.names(transposed_attractiveness_data))# Write the transposed data to a CSV file called transposed_attractiveness_data.csv with empty strings for missing valueswrite.csv(transposed_attractiveness_data, "./data/transposed_attractiveness_data.csv", row.names =TRUE, na ="")
Reliability of Facial Ratings Within Types:
Now that we have separated our data out into the different types of ratings, we can assess the degree of inter-rater agreement within rating types using intraclass correlation coefficients (ICCs) with the ‘psych’ package (Revelle & Revelle, 2015). The documentation of the psych package indicates that it uses the six varieties of ICCs outlined by Shrout & Fleiss (1979). For our purposes, we are (1) not interested in generalizing to the population of raters but rather the population of faces, and we are looking for (2) the consistency of ratings to calculate (3) average scores. Therefore, we will use the ICC(3,k), which is based on a two-way mixed model. The ICC(3,k) is actually identical to Chronbach’s alpha.
For the dominance ratings:
# Calculate ICC using a two-way mixed-effects model for average ratings and consistencyicc_result_dominance <-ICC(transposed_dominance_data)# View the resultprint(icc_result_dominance)
Call: ICC(x = transposed_dominance_data)
Intraclass correlation coefficients
type ICC F df1 df2 p lower bound upper bound
Single_raters_absolute ICC1 0.12 43 255 76288 0 0.11 0.15
Single_random_raters ICC2 0.12 56 255 75990 0 0.11 0.15
Single_fixed_raters ICC3 0.16 56 255 75990 0 0.13 0.18
Average_raters_absolute ICC1k 0.98 43 255 76288 0 0.97 0.98
Average_random_raters ICC2k 0.98 56 255 75990 0 0.97 0.98
Average_fixed_raters ICC3k 0.98 56 255 75990 0 0.98 0.99
Number of subjects = 256 Number of Judges = 299
See the help file for a discussion of the other 4 McGraw and Wong estimates,
For masculinity/femininity ratings:
# Calculate ICC using a two-way mixed-effects model for average ratings and consistencyicc_result_masculinityfemininity <-ICC(transposed_masculinityfemininity_data)# View the resultprint(icc_result_masculinityfemininity)
Call: ICC(x = transposed_masculinityfemininity_data)
Intraclass correlation coefficients
type ICC F df1 df2 p lower bound upper bound
Single_raters_absolute ICC1 0.28 119 255 76288 0 0.25 0.32
Single_random_raters ICC2 0.28 235 255 75990 0 0.25 0.33
Single_fixed_raters ICC3 0.44 235 255 75990 0 0.40 0.48
Average_raters_absolute ICC1k 0.99 119 255 76288 0 0.99 0.99
Average_random_raters ICC2k 0.99 235 255 75990 0 0.99 0.99
Average_fixed_raters ICC3k 1.00 235 255 75990 0 0.99 1.00
Number of subjects = 256 Number of Judges = 299
See the help file for a discussion of the other 4 McGraw and Wong estimates,
For attractiveness ratings:
# Calculate ICC using a two-way mixed-effects model for average ratings and consistencyicc_result_attractiveness <-ICC(transposed_attractiveness_data)# View the resultprint(icc_result_attractiveness)
Call: ICC(x = transposed_attractiveness_data)
Intraclass correlation coefficients
type ICC F df1 df2 p lower bound upper bound
Single_raters_absolute ICC1 0.16 59 255 76288 0 0.14 0.19
Single_random_raters ICC2 0.16 104 255 75990 0 0.14 0.19
Single_fixed_raters ICC3 0.26 104 255 75990 0 0.22 0.29
Average_raters_absolute ICC1k 0.98 59 255 76288 0 0.98 0.99
Average_random_raters ICC2k 0.98 104 255 75990 0 0.98 0.99
Average_fixed_raters ICC3k 0.99 104 255 75990 0 0.99 0.99
Number of subjects = 256 Number of Judges = 299
See the help file for a discussion of the other 4 McGraw and Wong estimates,
These ICCs (all > .98) are very high, indicating strong inter-rater reliability of facial ratings. Thus, we will calculate the average rating for each face for later analyses.
Calculating Average Facial Ratings
The following will calculate average facial ratings for each type of rating and add them to the main dataset structured with individuals, individualsdata.
# Calculate row means for all columns from the transposed datasets and ignore missing values (the IDs are in the row number positions, so we do not need to worry about starting with the second column for calculating the means)row_meansd <-rowMeans(transposed_dominance_data, na.rm =TRUE)row_meansmf <-rowMeans(transposed_masculinityfemininity_data, na.rm =TRUE)row_meansa <-rowMeans(transposed_attractiveness_data, na.rm =TRUE)# Create means ratings data frames with IDs (from the transposed data row names) and their corresponding meansmean_ratings_df_d <-data.frame(ID =rownames(transposed_dominance_data), facial_dominance = row_meansd)mean_ratings_df_mf <-data.frame(ID =rownames(transposed_masculinityfemininity_data), facial_masculinityfemininity = row_meansmf)mean_ratings_df_a <-data.frame(ID =rownames(transposed_attractiveness_data), facial_attractiveness = row_meansa)# Merges the mean dominance ratings data frame with the individualsdata data frame to create the new data frame individualsdata, then merges the mean masculinity/femininity and attractiveness ratings with individualsdata as wellindividualsdata <-merge(individualsdata, mean_ratings_df_d, by.x ="ID", by.y ="ID", all.x =TRUE)individualsdata <-merge(individualsdata, mean_ratings_df_mf, by.x ="ID", by.y ="ID", all.x =TRUE)individualsdata <-merge(individualsdata, mean_ratings_df_a, by.x ="ID", by.y ="ID", all.x =TRUE)
I have verified that the first couple ratings are correctly calculated by checking out the individualsdata data frame directly. The average facial ratings are now the last three columns and are called facial_dominance, facial_masculinityfemininity, and facial_attractiveness.
# Writing the current individualsdata data frame to the data folderwrite.csv(individualsdata, file ="./data/individuals_data_with_ratings.csv")
Reliability of Facial Width-to-Height Ratio
To allow for reliability analyses of facial width-to-height ratio (fWHR) measures, two independent raters measured fWHR using the aspect ratio produced by Fiji (Schindelin et al., 2012). Reliability will be assessed using ICCs before averaging them together for a final fWHR estimate. Because we are not interested in generalizing to the rater population and raters were not selected, this will also be a two-way mixed effects model where we are interested in consistency of ratings and computing average scores. Therefore, the ICC which will be used to assess inter-rater agreement will be the ICC(3,k) (Shrout & Fleiss, 1979).
# Calculating ICCs from fWHR_ben and fWHR_madicc_result_fWHR <-ICC(individualsdata[, c("fWHR_ben", "fWHR_mad")])# Viewing the resultsicc_result_fWHR
Call: ICC(x = individualsdata[, c("fWHR_ben", "fWHR_mad")])
Intraclass correlation coefficients
type ICC F df1 df2 p lower bound upper bound
Single_raters_absolute ICC1 0.97 63 264 265 1.5e-162 0.96 0.98
Single_random_raters ICC2 0.97 64 264 264 4.7e-163 0.96 0.98
Single_fixed_raters ICC3 0.97 64 264 264 4.7e-163 0.96 0.98
Average_raters_absolute ICC1k 0.98 63 264 265 1.5e-162 0.98 0.99
Average_random_raters ICC2k 0.98 64 264 264 4.7e-163 0.98 0.99
Average_fixed_raters ICC3k 0.98 64 264 264 4.7e-163 0.98 0.99
Number of subjects = 265 Number of Judges = 2
See the help file for a discussion of the other 4 McGraw and Wong estimates,
The ICC(3,k) shows excellent reliability of fWHR measurements between raters (.984), so we will average them together to get a final estimate.
Calculating Average fWHR Estimates
To calculate average fWHR estimates we will create a new data frame first—to retain the data at each stage of transformation—and then add the new fWHR variable to the copied data frame, individualsdata.
# Calculate the average fWHR and add it as a new variable "fWHR" in the data frameindividualsdata$fWHR <-with(individualsdata, (fWHR_ben + fWHR_mad) /2)
The last column now contains the fWHR variable, and rows where there were no values for fWHR_ben or fWHR_mad have been marked as NA.
Analysis to Determine whether to Create Composite Behavioral Dominance Measure:
In order to assess whether to create a composite behavioral dominance measure for use in later analyses, we will assess whether the SAT and the IPIP_dom are correlated. The following code checks the assumptions of a Pearson correlation and analyzes the bivariate correlation between these variables. The SAT is actually a count variable, but I will see if I can treat it as a continuous variable by assessing these assumptions.
# Histograms and QQ-plots to assess normalityhist(individualsdata$SAT)
# Scatter plot to assess linearity and homoscedasticityplot(individualsdata$SAT, individualsdata$IPIP_dom)
# Pearson Correlation with the variablescorr1 <-cor.test(individualsdata$SAT, individualsdata$IPIP_dom, method="pearson")# Display the resultprint(corr1, short =TRUE)
Pearson's product-moment correlation
data: individualsdata$SAT and individualsdata$IPIP_dom
t = 1.8667, df = 263, p-value = 0.06306
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.00623599 0.23165526
sample estimates:
cor
0.1143486
The histogram and QQ-plot of SAT indicates that it is far form normally distributed, and, if it is, it is highly right skewed. Although the histogram and QQ-plot indicate normality for IPIP_dom, the assumptions have already been violated. This makes the scatterplot and correlation uninterpretable, but either way neither of them indicate a relationship between the variables. Just to be sure, I would like to use a correlation coefficient that does not assume normality. Perhaps with a robust test there will be a correlation. In the following chunk, I will test the assumptions and run the analysis for a Spearman rank correlation.
# Checking for outliers with boxplotsboxplot(individualsdata$SAT, main="Boxplot for SAT Scores")
boxplot(individualsdata$IPIP_dom, main="Boxplot for IPIP_dom")
# Scatterplot to assess monotonic relationshipplot(individualsdata$SAT, individualsdata$IPIP_dom, main="Scatterplot", xlab="SAT", ylab="IPIP_dom")
# Spearman's rank correlation testcorr1.5<-corr.test(individualsdata$SAT, individualsdata$IPIP_dom, method ="spearman")# Displaying the Spearman correlation resultprint(corr1.5, short =FALSE)
Call:corr.test(x = individualsdata$SAT, y = individualsdata$IPIP_dom,
method = "spearman")
Correlation matrix
[1] 0.08
Sample Size
[1] 265
These are the unadjusted probability values.
The probability values adjusted for multiple tests are in the p.adj object.
[1] 0.22
Confidence intervals based upon normal theory. To get bootstrapped values, try cor.ci
raw.lower raw.r raw.upper raw.p lower.adj upper.adj
NA-NA -0.05 0.08 0.19 0.22 -0.05 0.19
There seems to be a few outliers in the SAT scores, but I took a look at them, and they are simply four responses that reach the upper end of the scale. Given that it is completely plausible that a person considers themselves to “often feel” 9, 10, or 11 of the adjectives included when calculating the scores for this variable, I do not think it is helpful to remove these values. The scatterplot seems to indicate monotonicity, with no strong indications of a changing relationship at high or low levels of either variable. Ultimately, the correlation is not significant, and is in any case very low, so it does not make sense to compile these variables into a composite behavioral dominance measure. This result was surprising to me, because I looked back at Palmer-Hague & Watson (2016) and found that they were moderately correlated in that sample (r = .28). So I looked at the descriptive statistics for the SAT there, and they were M = 3.4, SD = 2.45. The mean and standard deviation of the SAT in this sample are as follows.
mean(individualsdata$SAT, na.rm =TRUE)
[1] 2.498113
sd(individualsdata$SAT, na.rm =TRUE)
[1] 2.205579
Clearly we have a lower mean and SD for the variable here, which I am guessing is due to the preponderance of 0-2 responses in our sample (based on the histogram). Perhaps this is due to simply lower quality responses due to online data collection, rather than having parents complete packets and have their child bring it back to the lab.
Data Cleaning for Testing Hypotheses:
Creating Dyadic Data Set
Now I am going to prepare the dyadic data set so that we can use it to test our hypotheses. First, I need to export the current data frame as a csv file.
# Create the csv file in the "data" folderwrite.csv(individualsdata, file ="./data/individuals_data_with_fWHR.csv")
In order to create the dyadic data set, I first filter out all of the individuals for which their partner did not complete the study. Then I separate the individual data set into males and females and rename the variables with m_ for the mother variables and f_ for the father variables, except I make the ID columns have suffixes of _mother and _father. I then merge them using the mother ID_mother and father partner_ID_father columns.
# Filter out participants without a partnerindividualsdata_with_partners <- individualsdata[individualsdata$partner_completed ==1,]# Separate the dataset into two data frames, one for mothers and one for fathersmothers <- individualsdata_with_partners[individualsdata_with_partners$sex =="female", ]fathers <- individualsdata_with_partners[individualsdata_with_partners$sex =="male", ]# Rename the columns for mothers (to include prefix m_ to indicate mothers) and fathers (to include prefix f_ to indicate fathers) except for the columns with IDsnames(mothers) <-ifelse(names(mothers) %in%c('ID', 'partner_ID'), paste0(names(mothers), "_mother"), paste0("m_", names(mothers)))names(fathers) <-ifelse(names(fathers) %in%c('ID', 'partner_ID'), paste0(names(fathers), "_father"), paste0("f_", names(fathers)))# Merge the datasets using the mothers' (ID) and the fathers' (partner_ID) columnsdyadic_data <-merge(mothers, fathers, by.x='ID_mother', by.y='partner_ID_father')# Save the combined dataset to a new csv file called dyadic_data.csv in the data directory without including the row nameswrite.csv(dyadic_data, file ="./data/dyadic_data.csv", row.names =FALSE)
We have now created the dyadic_data data frame and saved it as dyadic_data.csv in the data directory. ID_mother is the ID of the mother, and all of the maternal variables have the prefix m_ to indicate that this is so. partner_ID_mother is the ID of the mother’s partner (i.e., the father). Similarly, ID_father is the ID of the father, and all of the paternal variables have the prefix f_ to indicate that this is so. partner_ID_father was used to merge the data frames but it is not retained in the dyadic data set. There are now 110 dyads in dyadic_data, corresponding to the 220 participants in individualsdata who’s partner validly completed the survey.
Adding Sex of First Shared Biological Child to Dyadic Dataset
Now that the dyadic dataset is created, I must add our primary outcome variable, shared_child_sex. This is the eldest (closes to first born) biological child shared by the couple. To add this variable, I could not simply use a function in R, because the sex_first_bio_child variable of each dyad member may not be the same sex for the other member of the dyad (e.g., if the person had children with another person before beginning their romantic relationship at the time of data collection). Therefore, I did this outside of R by (1) using an Excel function to determine the shared child’s sex if the sex_first_bio_child was matching for the members of the dyad and (2) for the cases where sex_first_bio_child did not match (n = 10) checking the raw data to try to find the shared child. In n = 4 of these cases, I was able to determine the sex of the shared child by matching the sex and age for each member of the dyad, and for n = 6 of these cases, there was no matching sex and age between the members of the dyad. It is not clear why these remaining n = 6 cases do not have a matching sex and age, because I have checked each case and it is not plausible for any of them that the child had their birthday between when the parents’ completed the study. In any case, there is now a total of n = 104 dyads with data for shared_child_sex. The variable shared_child_age was also added by either using the existing ages when they, and the sex, matched, or by going to the raw data to fish out the correct age. In n = 10 cases, the partner who completed the study first reported the shared child as one year younger than the partner that completed the study second, due to the time lapse between partners completing the study. In these cases, the first response was recorded for the shared_child_age.
The following code reads in the new dataset dyadic_data_with_shared_child_sex.csv to overwrite the data frame dyadic_data.
# Read in dyadic dataset with shared child sex included to overwrite current data framedyadic_data <-read.csv("./data/dyadic_data_with_shared_child_data.csv")# Remove participants that don't have shared_child_sex data (n = 6 from above)dyadic_data <- dyadic_data[!is.na(dyadic_data$shared_child_sex), ]
The new variable shared_child_sex is now in the dyadic_data data frame (with 0 as female and 1 as male). We will do some quick descriptive statistics for these children.
# The number of male and female shared first-bornstable(dyadic_data$shared_child_sex)
0 1
41 63
# Descriptives for the shared_child_agedescribe(dyadic_data$shared_child_age)
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 104 4 2.15 4 3.94 2.97 1 9 8 0.22 -1.06 0.21
Seperating the Effect of Other Facial Characteristics and Age from Facial Ratings of Dominance Within Sexes
Although we could use the data frame organized by individuals (as we did when we calculated standardized residuals before in the whole sample) by creating new data frames from it that filter by sex, I am actually going to just use the dyadic dataset that we have already created for two reasons: (1) it will then not require me to recreate the dyadic dataset after the current procedure and (2) it will only include the individuals that are a part of a dyad, thereby removing the potential for individuals who’s romantic partner did not also complete the study to bias the parameters of the models. Each model will include facial_dominance regressed on facial_masculinityfemininity, facial_attractiveness, and age, but first we will look at the bivariate relationships between these variables within each sex.
Bivariate Correlations Between Facial Characteristics and Age within Sexes
# Calculate pairwise correlations with p-values and confidence intervals for mothers and store it in corr2corr2 <-corr.test(dyadic_data[, c("m_facial_dominance", "m_facial_attractiveness", "m_facial_masculinityfemininity", "m_age")], use="pairwise.complete.obs")# Calculate pairwise correlations with p-values and confidence intervals for fathers and store it in corr3corr3 <-corr.test(dyadic_data[, c("f_facial_dominance", "f_facial_attractiveness", "f_facial_masculinityfemininity", "f_age")], use="pairwise.complete.obs")print(corr2, short =FALSE)
Looking at the output for mothers (corr2), we can see that facial dominance is moderately positively correlated with facial attractiveness, but it is not correlated with facial masculinity/femininity or age. For fathers, there was a similar moderate positive correlation between facial dominance and no correlation between facial dominance and facial attractiveness, but, in contrast, there was a very strong positive correlation between facial dominance and facial masculinity/femininity.
Constructing the Models for Residual Extraction
The following will construct the models for both sexes, create scatter plots of residual vs. fitted values to visually check for homoscedasticity, create QQ-plots to visually check for normality of residuals, and display the summary statistics for the models.
# Defining the models# Model for mothersres_model_mothers <-lm(m_facial_dominance ~ m_facial_masculinityfemininity + m_facial_attractiveness + m_age, data = dyadic_data)# Model for fathersres_model_fathers <-lm(f_facial_dominance ~ f_facial_masculinityfemininity + f_facial_attractiveness + f_age, data = dyadic_data)# Scatter plots to check for linearity and homoscedasticity# Model for mothersplot(residuals(res_model_mothers) ~fitted(res_model_mothers))abline(h=0, col="red")title("Residuals vs Fitted for Model for Mothers")
# Model for fathersplot(residuals(res_model_fathers) ~fitted(res_model_fathers))abline(h=0, col="red")title("Residuals vs Fitted for Model for Fathers")
# Check for normality of residuals# Model for mothersqqnorm(residuals(res_model_mothers), main ="QQ Plot for Model for Mothers")qqline(residuals(res_model_mothers))
# Model for fathersqqnorm(residuals(res_model_fathers), main ="QQ Plot for Model for Fathers")qqline(residuals(res_model_fathers))
# Summary statistics for both models# Model for motherssummary_res_model_mothers <-summary(res_model_mothers)print(summary_res_model_mothers)
Call:
lm(formula = m_facial_dominance ~ m_facial_masculinityfemininity +
m_facial_attractiveness + m_age, data = dyadic_data)
Residuals:
Min 1Q Median 3Q Max
-1.14036 -0.35473 -0.00397 0.28335 1.50451
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.19873 0.77469 0.257 0.798073
m_facial_masculinityfemininity 0.42356 0.10605 3.994 0.000124 ***
m_facial_attractiveness 0.52803 0.09940 5.312 6.58e-07 ***
m_age 0.01610 0.01081 1.490 0.139343
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4798 on 100 degrees of freedom
Multiple R-squared: 0.2201, Adjusted R-squared: 0.1967
F-statistic: 9.407 on 3 and 100 DF, p-value: 1.56e-05
# Model for fatherssummary_res_model_fathers <-summary(res_model_fathers)print(summary_res_model_fathers)
Call:
lm(formula = f_facial_dominance ~ f_facial_masculinityfemininity +
f_facial_attractiveness + f_age, data = dyadic_data)
Residuals:
Min 1Q Median 3Q Max
-1.38825 -0.22864 0.05321 0.25948 1.31102
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.673267 0.577199 -2.899 0.00461 **
f_facial_masculinityfemininity 0.927528 0.083566 11.099 < 2e-16 ***
f_facial_attractiveness 0.224574 0.081895 2.742 0.00724 **
f_age 0.006773 0.008948 0.757 0.45084
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4313 on 99 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.6173, Adjusted R-squared: 0.6057
F-statistic: 53.23 on 3 and 99 DF, p-value: < 2.2e-16
Taking a look at the residual vs. fitted values, there is not a perfectly symmetrical range of residual values for mothers across the distribution of fitted values of facial dominance. This may also be the case for fathers to a lesser degree, although it is harder to tell due to fewer values at the lower end of the distribution of fitted values. For both plots, it is clear that the assumption of linearity is reasonable, but the assumption of homogeneity of variance is not so obviously supported. Still, I think that it would be very stringent—particularly considering these models will not be used to test our hypotheses—to reject the assumption of homogeneity of variance based on these plots.
Next, for the QQ plots, there is some indication that the residuals are not normally distributed at the tails of the distribution for fathers, but the plot for mothers looks pretty good. Before moving on to the summary statistics for the models, I will construct a histogram of the residuals for fathers to take a look at why the QQ plot might deviate at the ends.
Histogram of residual values for fathers:
# Histogram for the residuals for the model with fathers datahist(residuals(res_model_fathers))
The histogram for residuals of the model for fathers seems to have some deviations from normality. Given that the mean for the residual values must be zero, we can see that the most frequent bin of values is just higher than the mean. That is, using these bins, the mean seems to be lower than the mode, indicating that there is some negative skew influencing the mean, which can also simply be seen visually. However, I would like to see this histogram with smaller bins of values to get a better sense of what is going on.
hist(residuals(res_model_fathers), breaks =10)
This more detailed histogram shows the same pattern, where the most frequent bin is just above zero with more values at the lower end of the distribution than the higher end of the distribution, indicating a slight degree of negative skew.
I am getting into the weeds a bit here, because we are not trying to make inferences about the significance of the model or it’s parameters here, but I wanted to look at the normality of these residuals because they will be important for later analyses.
Finally, looking at the summaries of the models, it seems that the model for fathers explains far more variance in facial dominance than the model for mothers does. This is likely drive by the massive beta value for facial masculinity and femininity, especially considering their zero-order relationship we found doing bivariate correlations.
Calculating Standardized Residuals for Facial Dominance within Sex
Now we will calculate the standardized residuals for facial dominance using the regression models within each sex. To do this, we will create the variables m_res_facial_dominance and f_res_facial_dominance by initializing them with NAs for all cases, and we will fill these variables with the standardized residuals for their respective models.
# Identify rows without missing values (because cases with missing values were excluded from both models during their construction)full_rows_res_model_mothers <-with(dyadic_data, complete.cases(m_facial_dominance, m_facial_masculinityfemininity, m_facial_attractiveness, m_age))full_rows_res_model_fathers <-with(dyadic_data, complete.cases(f_facial_dominance, f_facial_masculinityfemininity, f_facial_attractiveness, f_age))# Add the standardized residuals to the new data frame, aligning them with the full rows# Initializing new variables with NA values firstdyadic_data$m_res_facial_dominance <-NA# Initialize new variable with NAs for mothersdyadic_data$f_res_facial_dominance <-NA# Initialize new variable with NAs for fathers# Add the standardized residuals to the mothers variabledyadic_data$m_res_facial_dominance[full_rows_res_model_mothers] <-rstandard(res_model_mothers)# Add the standardized residuals to the fathers variabledyadic_data$f_res_facial_dominance[full_rows_res_model_fathers] <-rstandard(res_model_fathers)
The dyadic_data data frame now contains the m_res_facial_dominance and f_res_facial_dominance variables in the last columns.
Before moving on, I am going to write this new dataframe as a csv file to ensure it is saved.
Assessing Whether Facial Ratings and fWHR Vary by Facial Expression Within Sexes
Now we will take a look within the sexes at whether facial ratings differ between those that had a neutral facial expression—as we asked for—or had a non-neutral facial expression. To do this, we will test for group differences using an independent samples t-test. We will do this for mothers first, and we will test assumptions first.
For Mothers
The following code tests the assumptions of the independent-samples t-test to determine whether a regular or Welch’s t-test is most appropriate. QQ-plots are produced to assess normality, and Levene’s test is employed to assess whether homogeneity of variance can be assumed. Finally, we count the number of cases that have neutral and non-neutral facial expressions to see whether there are enough cases in both groups to make valid inferences.
# Q-Q Plots to assess normalityqqnorm(dyadic_data$m_facial_dominance, main ="QQ-plot for Mother's Facial Dominance")qqline(dyadic_data$m_facial_dominance)
qqnorm(dyadic_data$m_facial_attractiveness, main ="QQ-plot for Mother's Facial Attractiveness")qqline(dyadic_data$m_facial_attractiveness)
qqnorm(dyadic_data$m_facial_masculinityfemininity, main ="QQ-plot for Mother's Facial Masculinity/Femininity")qqline(dyadic_data$m_facial_masculinityfemininity)
qqnorm(dyadic_data$m_res_facial_dominance, main ="QQ-plot for Mother's Residual Facial Dominance")qqline(dyadic_data$m_res_facial_dominance)
# Convert 'm_expression_not_neutral' to a factordyadic_data$m_expression_not_neutral <-as.factor(dyadic_data$m_expression_not_neutral)# Levene's Test to assess homogeneity of variancesleveneTest(m_facial_dominance ~ m_expression_not_neutral, data = dyadic_data)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 0.4312 0.5129
102
leveneTest(m_facial_attractiveness ~ m_expression_not_neutral, data = dyadic_data)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 1.0618 0.3052
102
leveneTest(m_facial_masculinityfemininity ~ m_expression_not_neutral, data = dyadic_data)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 0.1374 0.7116
102
leveneTest(m_res_facial_dominance ~ m_expression_not_neutral, data = dyadic_data)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 0.0075 0.931
102
# Check the frequency of the facial expressions that are not neutralsummary(dyadic_data$m_expression_not_neutral)
neutral not_neutral
92 12
The QQ-plots for the mothers facial dominance, facial attractiveness, and facial masculinity/femininity do not look great—although residual facial dominance, which we are principally interested in here, looks pretty good—but with n = 104 cases, I think we can rely on the central limit theorum. For each of the Levene’s test, the p-value does not fall below p = .305, so we can assume homogeneity of variance for between the groups. However, there are n = 92 cases in the neutral facial expression group and only n = 12 cases in the non-neutral facial expression group. Because of this, I will do both a normal independent samples t-test and a Welch’s t-test for each comparison, just to make sure inferences are appropriate.
# Performing the regular t-tests (two-way)t_test_dom_expression <-t.test(m_facial_dominance ~ m_expression_not_neutral, data = dyadic_data, var.equal =TRUE)t_test_att_expression <-t.test(m_facial_attractiveness ~ m_expression_not_neutral, data = dyadic_data, var.equal =TRUE)t_test_masfem_expression <-t.test(m_facial_masculinityfemininity ~ m_expression_not_neutral, data = dyadic_data, var.equal =TRUE)t_test_res_dom_expression <-t.test(m_res_facial_dominance ~ m_expression_not_neutral, data = dyadic_data, var.equal =TRUE)# Performing the Welch's t-tests (two-way)w_t_test_dom_expression <-t.test(m_facial_dominance ~ m_expression_not_neutral, data = dyadic_data, var.equal =FALSE)w_t_test_att_expression <-t.test(m_facial_attractiveness ~ m_expression_not_neutral, data = dyadic_data, var.equal =FALSE)w_t_test_masfem_expression <-t.test(m_facial_masculinityfemininity ~ m_expression_not_neutral, data = dyadic_data, var.equal =FALSE)w_t_test_res_dom_expression <-t.test(m_res_facial_dominance ~ m_expression_not_neutral, data = dyadic_data, var.equal =FALSE)# Printing the resultst_test_dom_expression
Two Sample t-test
data: m_facial_dominance by m_expression_not_neutral
t = 2.1763, df = 102, p-value = 0.03184
alternative hypothesis: true difference in means between group neutral and group not_neutral is not equal to 0
95 percent confidence interval:
0.0311239 0.6713665
sample estimates:
mean in group neutral mean in group not_neutral
3.824159 3.472914
w_t_test_dom_expression
Welch Two Sample t-test
data: m_facial_dominance by m_expression_not_neutral
t = 2.2758, df = 14.418, p-value = 0.0386
alternative hypothesis: true difference in means between group neutral and group not_neutral is not equal to 0
95 percent confidence interval:
0.02111682 0.68137363
sample estimates:
mean in group neutral mean in group not_neutral
3.824159 3.472914
t_test_att_expression
Two Sample t-test
data: m_facial_attractiveness by m_expression_not_neutral
t = -1.7386, df = 102, p-value = 0.08512
alternative hypothesis: true difference in means between group neutral and group not_neutral is not equal to 0
95 percent confidence interval:
-0.85526641 0.05627268
sample estimates:
mean in group neutral mean in group not_neutral
3.282658 3.682154
w_t_test_att_expression
Welch Two Sample t-test
data: m_facial_attractiveness by m_expression_not_neutral
t = -1.5254, df = 13.129, p-value = 0.1509
alternative hypothesis: true difference in means between group neutral and group not_neutral is not equal to 0
95 percent confidence interval:
-0.9647153 0.1657216
sample estimates:
mean in group neutral mean in group not_neutral
3.282658 3.682154
t_test_masfem_expression
Two Sample t-test
data: m_facial_masculinityfemininity by m_expression_not_neutral
t = 2.2341, df = 102, p-value = 0.02766
alternative hypothesis: true difference in means between group neutral and group not_neutral is not equal to 0
95 percent confidence interval:
0.05155174 0.86744362
sample estimates:
mean in group neutral mean in group not_neutral
3.121890 2.662393
w_t_test_masfem_expression
Welch Two Sample t-test
data: m_facial_masculinityfemininity by m_expression_not_neutral
t = 2.5291, df = 15.251, p-value = 0.02292
alternative hypothesis: true difference in means between group neutral and group not_neutral is not equal to 0
95 percent confidence interval:
0.07280782 0.84618754
sample estimates:
mean in group neutral mean in group not_neutral
3.121890 2.662393
t_test_res_dom_expression
Two Sample t-test
data: m_res_facial_dominance by m_expression_not_neutral
t = 2.8848, df = 102, p-value = 0.004779
alternative hypothesis: true difference in means between group neutral and group not_neutral is not equal to 0
95 percent confidence interval:
0.2680257 1.4476950
sample estimates:
mean in group neutral mean in group not_neutral
0.09918414 -0.75867621
w_t_test_res_dom_expression
Welch Two Sample t-test
data: m_res_facial_dominance by m_expression_not_neutral
t = 2.6666, df = 13.45, p-value = 0.01893
alternative hypothesis: true difference in means between group neutral and group not_neutral is not equal to 0
95 percent confidence interval:
0.165223 1.550498
sample estimates:
mean in group neutral mean in group not_neutral
0.09918414 -0.75867621
Mother’s Facial Dominance:
Both the regular and Welch’s t-test indicate that there is a group difference in dominance ratings between the neutral and non-neutral facial expression groups, with the neutral group (M = 3.824) rated as more dominant than the non-neutral group (M = 3.473). This is presumably because most of the non-neutral faces were smiling.
Mother’s Facial Attractiveness:
The both versions of the t-test indicate that there is a marginal group difference in attractiveness ratings between the neutral and non-neutral facial expression groups, with the neutral group (M = 3.283) rated as less attractive than the non-neutral group (M = 3.682) (again, I presume due to smiling).
Mother’s Facial Masculinity/Femininity:
Both the regular and Welch’s t-test indicate that there is a group difference in masculinity/femininity ratings between the neutral and non-neutral facial expression groups, with the neutral group (M = 3.122) rated as more masculine (less feminine) than the non-neutral group (M = 2.662). If I am not mistaken, there are sex differences in the frequency of smiling such that women tend to smile more, which may be why neutral facial expressions were rated as less feminine (more masculine) for these women.
Mother’s Residual Facial Dominance:
Both the regular and Welch’s t-test indicate that there is a group difference in residual dominance scores between the neutral and non-neutral facial expression groups, with the neutral group (M = 0.099) higher on residual dominance than the non-neutral group (M = -0.759). This is presumably because most of the non-neutral faces were smiling.
This last result indicates that when testing hypotheses later in the analysis, we would construct the model both with and without the non-neutral facial expressions to ensure that this is not affecting the results. I am curious how large these group differences are, so the following code will calculate the Cohen’s d for each of these group differences.
cohen.d(m_facial_dominance ~ m_expression_not_neutral, data = dyadic_data)
Call: cohen.d(x = m_facial_dominance ~ m_expression_not_neutral, data = dyadic_data)
Cohen d statistic of difference between two means
lower effect upper
m_facial_dominance -1.28 -0.67 -0.06
Multivariate (Mahalanobis) distance between groups
[1] 0.67
r equivalent of difference between two means
m_facial_dominance
-0.21
cohen.d(m_facial_attractiveness ~ m_expression_not_neutral, data = dyadic_data)
Call: cohen.d(x = m_facial_attractiveness ~ m_expression_not_neutral,
data = dyadic_data)
Cohen d statistic of difference between two means
lower effect upper
m_facial_attractiveness -0.07 0.54 1.14
Multivariate (Mahalanobis) distance between groups
[1] 0.54
r equivalent of difference between two means
m_facial_attractiveness
0.17
cohen.d(m_facial_masculinityfemininity ~ m_expression_not_neutral, data = dyadic_data)
Call: cohen.d(x = m_facial_masculinityfemininity ~ m_expression_not_neutral,
data = dyadic_data)
Cohen d statistic of difference between two means
lower effect upper
m_facial_masculinityfemininity -1.3 -0.69 -0.08
Multivariate (Mahalanobis) distance between groups
[1] 0.69
r equivalent of difference between two means
m_facial_masculinityfemininity
-0.22
cohen.d(m_res_facial_dominance ~ m_expression_not_neutral, data = dyadic_data)
Call: cohen.d(x = m_res_facial_dominance ~ m_expression_not_neutral,
data = dyadic_data)
Cohen d statistic of difference between two means
lower effect upper
m_res_facial_dominance -1.51 -0.89 -0.28
Multivariate (Mahalanobis) distance between groups
[1] 0.89
r equivalent of difference between two means
m_res_facial_dominance
-0.27
Given the conventional standards for Cohen’s d, facial dominance, facial attractiveness, and facial masculinity/femininity each show medium group differences. In addition, the estimate of Cohen’s d for residual facial dominance scores is fairly large (d = -.89).
For Fathers
The following code tests the assumptions of the independent-samples t-test to determine whether a regular or Welch’s t-test is most appropriate. QQ-plots are produced to assess normality, and Levene’s test is employed to assess whether homogeneity of variance can be assumed. Finally, we count the number of cases that have neutral and non-neutral facial expressions to see whether there are enough cases in both groups to make valid inferences.
# Q-Q Plots to assess normalityqqnorm(dyadic_data$f_facial_dominance, main ="QQ-plot for Fathers's Facial Dominance")qqline(dyadic_data$f_facial_dominance)
qqnorm(dyadic_data$f_facial_attractiveness, main ="QQ-plot for Fathers's Facial Attractiveness")qqline(dyadic_data$f_facial_attractiveness)
qqnorm(dyadic_data$f_facial_masculinityfemininity, main ="QQ-plot for Fathers's Facial Masculinity/Femininity")qqline(dyadic_data$f_facial_masculinityfemininity)
qqnorm(dyadic_data$f_res_facial_dominance, main ="QQ-plot for Fathers's Residual Facial Dominance")qqline(dyadic_data$f_res_facial_dominance)
# Convert 'f_expression_not_neutral' to a factor variabledyadic_data$f_expression_not_neutral <-as.factor(dyadic_data$f_expression_not_neutral)# Levene's Test to assess homogeneity of variancesleveneTest(f_facial_dominance ~ f_expression_not_neutral, data = dyadic_data)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 6.1346 0.01492 *
101
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
leveneTest(f_facial_attractiveness ~ f_expression_not_neutral, data = dyadic_data)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 0.0193 0.8898
101
leveneTest(f_facial_masculinityfemininity ~ f_expression_not_neutral, data = dyadic_data)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 0.0018 0.9658
101
leveneTest(f_res_facial_dominance ~ f_expression_not_neutral, data = dyadic_data)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 4.0448 0.04697 *
101
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Check the frequency of the facial expressions that are not neutralsummary(dyadic_data$f_expression_not_neutral)
neutral not_neutral
91 13
The QQ-plots for the fathers facial dominance, facial attractiveness, facial masculinity/femininity, and residual facial dominance are not as uniform as I would like them to be, but, as with above, having n = 104 cases should allow us to be liberal with the assumption of normality. However, Levene’s test indicates that we should reject the null hypothesis that variances are equal for facial dominance and, almost, for residual facial dominance. Further, there is a large difference is the number of cases in each group (neutral n = 91 and non-neutral n = 13). Taken together, this indicates that we should make our inferences from the results for Welch’s t-tests below.
# Performing the regular t-tests (two-way)t_test_dom_expression_f <-t.test(f_facial_dominance ~ f_expression_not_neutral, data = dyadic_data, var.equal =TRUE)t_test_att_expression_f <-t.test(f_facial_attractiveness ~ f_expression_not_neutral, data = dyadic_data, var.equal =TRUE)t_test_masfem_expression_f <-t.test(f_facial_masculinityfemininity ~ f_expression_not_neutral, data = dyadic_data, var.equal =TRUE)t_test_res_dom_expression_f <-t.test(f_res_facial_dominance ~ f_expression_not_neutral, data = dyadic_data, var.equal =TRUE)# Performing the Welch's t-tests (two-way)w_t_test_dom_expression_f <-t.test(f_facial_dominance ~ f_expression_not_neutral, data = dyadic_data, var.equal =FALSE)w_t_test_att_expression_f <-t.test(f_facial_attractiveness ~ f_expression_not_neutral, data = dyadic_data, var.equal =FALSE)w_t_test_masfem_expression_f <-t.test(f_facial_masculinityfemininity ~ f_expression_not_neutral, data = dyadic_data, var.equal =FALSE)w_t_test_res_dom_expression_f <-t.test(f_res_facial_dominance ~ f_expression_not_neutral, data = dyadic_data, var.equal =FALSE)# Printing the resultst_test_dom_expression_f
Two Sample t-test
data: f_facial_dominance by f_expression_not_neutral
t = 2.9647, df = 101, p-value = 0.003781
alternative hypothesis: true difference in means between group neutral and group not_neutral is not equal to 0
95 percent confidence interval:
0.199431 1.006044
sample estimates:
mean in group neutral mean in group not_neutral
4.316144 3.713407
w_t_test_dom_expression_f
Welch Two Sample t-test
data: f_facial_dominance by f_expression_not_neutral
t = 4.7533, df = 23.491, p-value = 8.184e-05
alternative hypothesis: true difference in means between group neutral and group not_neutral is not equal to 0
95 percent confidence interval:
0.3407260 0.8647491
sample estimates:
mean in group neutral mean in group not_neutral
4.316144 3.713407
t_test_att_expression_f
Two Sample t-test
data: f_facial_attractiveness by f_expression_not_neutral
t = -2.6664, df = 101, p-value = 0.008929
alternative hypothesis: true difference in means between group neutral and group not_neutral is not equal to 0
95 percent confidence interval:
-0.8194121 -0.1202965
sample estimates:
mean in group neutral mean in group not_neutral
2.974734 3.444589
w_t_test_att_expression_f
Welch Two Sample t-test
data: f_facial_attractiveness by f_expression_not_neutral
t = -2.5114, df = 13.604, p-value = 0.02533
alternative hypothesis: true difference in means between group neutral and group not_neutral is not equal to 0
95 percent confidence interval:
-0.8722178 -0.0674908
sample estimates:
mean in group neutral mean in group not_neutral
2.974734 3.444589
t_test_masfem_expression_f
Two Sample t-test
data: f_facial_masculinityfemininity by f_expression_not_neutral
t = 2.0491, df = 101, p-value = 0.04304
alternative hypothesis: true difference in means between group neutral and group not_neutral is not equal to 0
95 percent confidence interval:
0.01051675 0.64843906
sample estimates:
mean in group neutral mean in group not_neutral
5.427786 5.098308
w_t_test_masfem_expression_f
Welch Two Sample t-test
data: f_facial_masculinityfemininity by f_expression_not_neutral
t = 2.0051, df = 13.888, p-value = 0.06484
alternative hypothesis: true difference in means between group neutral and group not_neutral is not equal to 0
95 percent confidence interval:
-0.02321507 0.68217089
sample estimates:
mean in group neutral mean in group not_neutral
5.427786 5.098308
t_test_res_dom_expression_f
Two Sample t-test
data: f_res_facial_dominance by f_expression_not_neutral
t = 3.1036, df = 101, p-value = 0.00248
alternative hypothesis: true difference in means between group neutral and group not_neutral is not equal to 0
95 percent confidence interval:
0.3346447 1.5202643
sample estimates:
mean in group neutral mean in group not_neutral
0.1086182 -0.8188363
w_t_test_res_dom_expression_f
Welch Two Sample t-test
data: f_res_facial_dominance by f_expression_not_neutral
t = 2.2173, df = 12.221, p-value = 0.04628
alternative hypothesis: true difference in means between group neutral and group not_neutral is not equal to 0
95 percent confidence interval:
0.0179447 1.8369642
sample estimates:
mean in group neutral mean in group not_neutral
0.1086182 -0.8188363
Father’s Facial Dominance:
The Welch’s t-test indicates that there is a difference between the neutral and non-neutral group such that the neutral group (M = 4.316) was rated significantly more dominant than the non-neutral group (M = 3.713). I suspect this is due to most non-neutral expressions smiling.
Father’s Facial Attractiveness:
The Welch’s t-test indicates that there is a difference between the neutral and non-neutral group such that the neutral group (M = 2.975) was rated significantly less attractive than the non-neutral group (M = 3.445) (probably because they were mostly smiling).
Father’s Facial Masculinity/Femininity:
The Welch’s t-test (and regular t-test) was close to our level for significance here (p = .06), suggesting there may be a difference between the neutral and non-neutral group such that the neutral group (M = 5.428) was rated significantly more masculine (less feminine) than the non-neutral group (M = 5.098). Again, I suspect that because women tend to smile more than men, the smiling can account for the difference in ratings here, but I would need to look into this.
Father’s Residual Facial Dominance:
The Welch’s t-test indicates that there is a difference between the neutral and non-neutral group such that the neutral group (M = 0.109) has significantly higher residual facial dominance scores than the non-neutral group (M = -0.819). I suspect this is due to most non-neutral expressions smiling.
This last result indicates that when testing hypotheses later in the analysis, we would construct the model both with and without the non-neutral facial expressions to ensure that this is not affecting the results. And, again, the following code will calculate the Cohen’s d for each of these group differences.
cohen.d(f_facial_dominance ~ f_expression_not_neutral, data = dyadic_data)
Call: cohen.d(x = f_facial_dominance ~ f_expression_not_neutral, data = dyadic_data)
Cohen d statistic of difference between two means
lower effect upper
f_facial_dominance -1.53 -0.92 -0.3
Multivariate (Mahalanobis) distance between groups
[1] 0.92
r equivalent of difference between two means
f_facial_dominance
-0.28
cohen.d(f_facial_attractiveness ~ f_expression_not_neutral, data = dyadic_data)
Call: cohen.d(x = f_facial_attractiveness ~ f_expression_not_neutral,
data = dyadic_data)
Cohen d statistic of difference between two means
lower effect upper
f_facial_attractiveness 0.21 0.83 1.44
Multivariate (Mahalanobis) distance between groups
[1] 0.83
r equivalent of difference between two means
f_facial_attractiveness
0.26
cohen.d(f_facial_masculinityfemininity ~ f_expression_not_neutral, data = dyadic_data)
Call: cohen.d(x = f_facial_masculinityfemininity ~ f_expression_not_neutral,
data = dyadic_data)
Cohen d statistic of difference between two means
lower effect upper
f_facial_masculinityfemininity -1.24 -0.64 -0.03
Multivariate (Mahalanobis) distance between groups
[1] 0.64
r equivalent of difference between two means
f_facial_masculinityfemininity
-0.2
cohen.d(f_res_facial_dominance ~ f_expression_not_neutral, data = dyadic_data)
Call: cohen.d(x = f_res_facial_dominance ~ f_expression_not_neutral,
data = dyadic_data)
Cohen d statistic of difference between two means
lower effect upper
f_res_facial_dominance -1.58 -0.96 -0.34
Multivariate (Mahalanobis) distance between groups
[1] 0.96
r equivalent of difference between two means
f_res_facial_dominance
-0.3
Interestingly, it seems that having a non-neutral facial expression had a slightly larger effect on rated facial dominance and residual facial dominance in men than in the previous analyses for women (both large effects; d = -.92 and d = -.96, respectively). For facial masculinity/femininity and facial attractiveness, effect sizes were similar to those of the mothers group.
Demographic and Recruitment Characteristics
Before moving on, I want to quickly take a look at some demographic and recruitment characteristics within the mothers and fathers in the dyadic dataset. That is, we will look at the method of recruitment (m_rec_method and f_rec_method), whether the participant resubmitted their photograph at our request (m_photo_resubmission and f_photo_resubmission), ethnicity (m_ethnicity and f_ethnicity), nationality (m_nationality and f_nationality), and age (m_age and f_age). Some of these variables need to be recoded as factor variables—in accordance with the labels specified in the Introduction above—which will be done before producing the descriptive statistics.
# Recoding variables currently numeric as factorsdyadic_data$m_rec_meth <-as.factor(dyadic_data$m_rec_meth)dyadic_data$f_rec_meth <-as.factor(dyadic_data$f_rec_meth)dyadic_data$m_photo_resubmission <-as.factor(dyadic_data$m_photo_resubmission)dyadic_data$f_photo_resubmission <-as.factor(dyadic_data$f_photo_resubmission)dyadic_data$m_ethnicity <-as.factor(dyadic_data$m_ethnicity)dyadic_data$f_ethnicity <-as.factor(dyadic_data$f_ethnicity)dyadic_data$m_nationality <-as.factor(dyadic_data$m_nationality)dyadic_data$f_nationality <-as.factor(dyadic_data$f_nationality)# Frequencies and barplot for mothers' recruitment methodtable_m_rec_meth <-table(dyadic_data$m_rec_meth)print(table_m_rec_meth)
prolific school
103 1
barplot(table_m_rec_meth, main ="Barplot of Mothers Recruitment Method")
# Frequencies and barplot for fathers' recruitment methodtable_f_rec_meth <-table(dyadic_data$f_rec_meth)print(table_f_rec_meth)
prolific school
103 1
barplot(table_f_rec_meth, main ="Barplot of Fathers Recruitment Method")
# Frequencies and barplot for mothers' photograph resubmissionstable_m_photo_resubmission <-table(dyadic_data$m_photo_resubmission)print(table_m_photo_resubmission)
had_resubmission no_resubmission
16 88
barplot(table_m_photo_resubmission, main ="Barplot of Mothers Photo Resubmissions")
# Frequencies and barplot for fathers' photograph resubmissionstable_f_photo_resubmission <-table(dyadic_data$f_photo_resubmission)print(table_f_photo_resubmission)
had_resubmission no_resubmission
14 90
barplot(table_f_photo_resubmission, main ="Barplot of Fathers Photo Resubmissions")
# Frequencies and barplot for mothers' ethnicitytable_m_ethnicity <-table(dyadic_data$m_ethnicity)print(table_m_ethnicity)
African Black/African_American Black/British
21 4 1
Caribbean East Asian Latino/Hispanic
1 4 1
Middle Eastern Mixed South Asian
1 3 4
South East Asian White/Caucasian
2 62
barplot(table_m_ethnicity, las =2, cex.names = .8, main ="Barplot of Mothers Ethnicity")
# Frequencies and barplot for fathers' ethnicitytable_f_ethnicity <-table(dyadic_data$f_ethnicity)print(table_f_ethnicity)
African Black/African_American Black/British
21 2 1
Caribbean East Asian Indian
1 4 1
Latino/Hispanic Middle Eastern Mixed
2 1 1
South Asian South East Asian White/Caucasian
2 2 66
barplot(table_f_ethnicity, las =2, cex.names = .8, main ="Barplot of Fathers Ethnicity")
# Frequencies and barplot for mothers' nationalitytable_m_nationality <-table(dyadic_data$m_nationality)print(table_m_nationality)
Australia Austria Bulgaria Canada Hungary
1 1 1 2 2
Italy Netherlands New Zealand Nigeria Phillipines
2 1 1 1 1
Poland Portugal South Africa Spain United Kingdom
2 5 21 3 48
United States Zimbabwe
9 3
barplot(table_m_nationality, las =2, cex.names = .8, main ="Barplot of Mothers Nationality")
# Frequencies and barplot for fathers' ethnicitytable_f_nationality <-table(dyadic_data$f_nationality)print(table_f_nationality)
Australia Canada Hungary Italy Netherlands
1 2 1 3 1
New Zealand Nigeria Pakistan Poland Portugal
1 1 1 1 4
South Africa Spain Turkey United Kingdom United States
21 1 1 52 10
Zimbabwe
3
barplot(table_f_nationality, las =2, cex.names = .8, main ="Barplot of Fathers Nationality")
Recruitment Method:
As expected, only one participant from recruitment from schools was included, because only one couple submitted both facial photographs.
Photograph Resubmissions:
The vast majority of people did not need to resubmit their photograph, with n = 16 in the mothers’ group and n = 14 in the fathers’ group having resubmissions.
Ethnicity:
Regarding ethnicity, the bar plots for mothers and fathers both indicate that the majority of mothers and fathers were white/caucasian (n = 62 and n = 66, respectively), and the second most common ethnic category for each group was African (n = 21 and n = 21, respectively). No other ethnic category had more than n = 5 cases.
Nationality:
For nationality, the bar plots for mothers’ and fathers’ nationality indicate that the most common nationality was United Kingdom (n = 48 and n = 52, respectively), followed by South Africa (n = 21 and n = 21, respectively) and the United States (n = 9 and n = 10, respectively). No other nationality had more than n = 5 cases.
A Quick Look at the Maternal Dominance Hypothesis for Individual Differences in Condition
Before moving on to test our Hypothesis 2 or Hypothesis 3, I am curious if we can replicate the maternal dominance hypothesis (Grant, 1990), so I will run some simple t-tests with sex of first born child as the grouping variable and the SAT dominance and IPIP dominance measures as the dependent variable. To retain all of the female participants, I will run this analysis by creating a data frame called female_individuals with only female individuals from the individualsdata data frame. First, I will test the assumptions, then I will do the t-tests.
# Create a data frame with only female individuals from the individualsdata data framefemale_individuals <- individualsdata[individualsdata$sex =="female", ]# Q-Q Plots to assess normalityqqnorm(female_individuals$SAT, main ="QQ-plot for Female Individuals SAT")qqline(female_individuals$SAT)
qqnorm(female_individuals$IPIP_dom, main ="QQ-plot for Female Individuals IPIP")qqline(female_individuals$IPIP_dom)
# Levene's Test to assess homogeneity of variancesleveneTest(IPIP_dom ~ sex_first_bio_child, data = female_individuals)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 0.2626 0.6092
139
leveneTest(SAT ~ sex_first_bio_child, data = female_individuals)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 0.3158 0.5751
139
The QQ-plot for the SAT dominance has some deviation from predicted values at the lower end of the distribution, but the IPIP dominance looks good. Also, there are n = 140 observations for each variable, so I am not worried about violations of the assumption of normality. Based on Levene’s test, we will accept the null hypothesis that there is no difference between the variances of the two groups.
Now for Student’s t-test.
# Saving the results of the t-testsmdh_t_test_SAT <-t.test(SAT ~ sex_first_bio_child, data = female_individuals, var.equal =TRUE)mdh_t_test_IPIP <-t.test(IPIP_dom ~ sex_first_bio_child, data = female_individuals, var.equal =TRUE)# Displaying the resultsmdh_t_test_SAT
Two Sample t-test
data: SAT by sex_first_bio_child
t = 0.22269, df = 139, p-value = 0.8241
alternative hypothesis: true difference in means between group female and group male is not equal to 0
95 percent confidence interval:
-0.6481232 0.8126540
sample estimates:
mean in group female mean in group male
2.423729 2.341463
mdh_t_test_IPIP
Two Sample t-test
data: IPIP_dom by sex_first_bio_child
t = -0.57983, df = 139, p-value = 0.563
alternative hypothesis: true difference in means between group female and group male is not equal to 0
95 percent confidence interval:
-3.375335 1.844537
sample estimates:
mean in group female mean in group male
25.27119 26.03659
In this dataset, we are unable to conceptually replicate the individual differences version of (Grant, 1990) using a simple t-test, although it is worth noting that her behavioral dominance data came from just before the children were born. By adding in age_first_bio_child as a covariate in a general linear model with sex_first_bio_child as a predictor and the behavioral dominance measures as outcome variables, we could see whether the time since birth of the child is potentially confounding the results here.
# Constructing the modelsMDH_glm_SAT <-lm(SAT ~ sex_first_bio_child + age_first_bio_child, data = female_individuals)MDH_glm_IPIP <-lm(SAT ~ sex_first_bio_child + age_first_bio_child, data = female_individuals)# Plotting residuals vs. fitted values for linearity and homoscedasticity and producing QQ plots for normality of residualsplot(MDH_glm_SAT)
plot(MDH_glm_IPIP)
# Durbin-Watson test for independence of errorsdwtest(MDH_glm_SAT)
Durbin-Watson test
data: MDH_glm_SAT
DW = 2.1188, p-value = 0.7698
alternative hypothesis: true autocorrelation is greater than 0
dwtest(MDH_glm_IPIP)
Durbin-Watson test
data: MDH_glm_IPIP
DW = 2.1188, p-value = 0.7698
alternative hypothesis: true autocorrelation is greater than 0
# Displaying the resultssummary(MDH_glm_SAT)
Call:
lm(formula = SAT ~ sex_first_bio_child + age_first_bio_child,
data = female_individuals)
Residuals:
Min 1Q Median 3Q Max
-2.5059 -1.4212 -0.4008 1.5153 8.6415
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.33654 0.45238 5.165 8.23e-07 ***
sex_first_bio_childmale -0.08388 0.37072 -0.226 0.821
age_first_bio_child 0.02117 0.08575 0.247 0.805
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.171 on 138 degrees of freedom
Multiple R-squared: 0.0007979, Adjusted R-squared: -0.01368
F-statistic: 0.0551 on 2 and 138 DF, p-value: 0.9464
summary(MDH_glm_IPIP)
Call:
lm(formula = SAT ~ sex_first_bio_child + age_first_bio_child,
data = female_individuals)
Residuals:
Min 1Q Median 3Q Max
-2.5059 -1.4212 -0.4008 1.5153 8.6415
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.33654 0.45238 5.165 8.23e-07 ***
sex_first_bio_childmale -0.08388 0.37072 -0.226 0.821
age_first_bio_child 0.02117 0.08575 0.247 0.805
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.171 on 138 degrees of freedom
Multiple R-squared: 0.0007979, Adjusted R-squared: -0.01368
F-statistic: 0.0551 on 2 and 138 DF, p-value: 0.9464
Most of the assumptions are met here, but the residuals vs. fitted values plots for both models look like there may be heteroscedasticity. Because of this, I will more formally test this with a Breusch-Pagan test.
# Conducting the Breusch-Pagan test for homogeneity of variancebptest(MDH_glm_SAT)
studentized Breusch-Pagan test
data: MDH_glm_SAT
BP = 0.72161, df = 2, p-value = 0.6971
bptest(MDH_glm_IPIP)
studentized Breusch-Pagan test
data: MDH_glm_IPIP
BP = 0.72161, df = 2, p-value = 0.6971
The Breusch-Pagan test is not significant, so we will accept the null hypothesis that the variance of the residuals is constant across levels of the independent variable. Taking a look at the summaries of the models, it does not seem that adding in age_first_bio_child as a covariate affected the way that sex_first_bio_child relates (or does not relate, rather) to SAT or to IPIP dominance. Furthermore, neither model explains much of the dependent variable as determined by the F-tests.
Now I am going to do the same analysis for the self-reported dominance version of the dominance and prestige scale (Cheng et al., 2010) (honestly, because I forgot to do it the first time). First, to check the assumptions of the t-test we will do a QQ plot and a Levene’s test.
# QQ-plot for normalityqqnorm(female_individuals$SR_dom_cheng, main ="QQ-plot for Female Individuals SR Dominance (Cheng)")qqline(female_individuals$SR_dom_cheng)
# Levene's Test to assess homogeneity of variancesleveneTest(IPIP_dom ~ sex_first_bio_child, data = female_individuals)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 0.2626 0.6092
139
The QQ-plot deviates from normality at the ends a bit, but I think with n = 140 observations we will be OK to assume normality for this distribution. Just in case, however, I will run the t-test as Students and as Welsch’s (which is more rhobust to violations of normality) to be sure. We can also accept the null hypothesis of homogeneity of variances as per the Levene’s test.
Now for the t-tests.
# Saving the results of the t-testmdh_t_test_srcheng <-t.test(SR_dom_cheng ~ sex_first_bio_child, data = female_individuals, var.equal =TRUE)mdh_w_t_test_srcheng <-t.test(SR_dom_cheng ~ sex_first_bio_child, data = female_individuals, var.equal =FALSE)# Displaying the resultsmdh_t_test_srcheng
Two Sample t-test
data: SR_dom_cheng by sex_first_bio_child
t = -1.9308, df = 139, p-value = 0.05555
alternative hypothesis: true difference in means between group female and group male is not equal to 0
95 percent confidence interval:
-0.671677323 0.007973313
sample estimates:
mean in group female mean in group male
2.561441 2.893293
mdh_w_t_test_srcheng
Welch Two Sample t-test
data: SR_dom_cheng by sex_first_bio_child
t = -1.9718, df = 133.44, p-value = 0.0507
alternative hypothesis: true difference in means between group female and group male is not equal to 0
95 percent confidence interval:
-0.664726135 0.001022125
sample estimates:
mean in group female mean in group male
2.561441 2.893293
Interestingly, the results of this t-test—although not significant at our alpha level—indicate that the mothers of female first-born children are lower in self-reported behavioral dominance than the mothers of first-born children, as would be predicted by the MDH. I would like to see whether this result becomes more robust if we put it into the general linear model with age of first-born child as a coveriate (as done above).
# Constructing the modelMDH_glm_srcheng <-lm(SR_dom_cheng ~ sex_first_bio_child + age_first_bio_child, data = female_individuals)# Plotting residuals vs. fitted values for linearity and homoscedasticity and producing QQ plots for normality of residualsplot(MDH_glm_srcheng)
# Durbin-Watson test for independence of errorsdwtest(MDH_glm_srcheng)
Durbin-Watson test
data: MDH_glm_srcheng
DW = 1.7273, p-value = 0.05483
alternative hypothesis: true autocorrelation is greater than 0
# Displaying the resultssummary(MDH_glm_srcheng)
Call:
lm(formula = SR_dom_cheng ~ sex_first_bio_child + age_first_bio_child,
data = female_individuals)
Residuals:
Min 1Q Median 3Q Max
-1.9048 -0.6934 -0.0886 0.6468 4.1541
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.80380 0.20886 13.424 <2e-16 ***
sex_first_bio_childmale 0.33635 0.17116 1.965 0.0514 .
age_first_bio_child -0.05885 0.03959 -1.486 0.1395
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.002 on 138 degrees of freedom
Multiple R-squared: 0.04146, Adjusted R-squared: 0.02757
F-statistic: 2.985 on 2 and 138 DF, p-value: 0.05383
Like above, I want to ensure that the residuals for the model are have constant variance, so I will formally test this using the Breusch-Pagan test.
bptest(MDH_glm_srcheng)
studentized Breusch-Pagan test
data: MDH_glm_srcheng
BP = 0.75531, df = 2, p-value = 0.6855
The Breusch-Pagan test indicates we should accept the null of homogeneity of variance.
Looking at the output of the summary of the model, we can see that the F-test was not significant, and the predictive ability of sex of first born child did not flip significance.
In sum, it does not look like there is evidence here for the individual differences version of the maternal dominance hypothesis. Although the t-test using self-reported dominance was close to significant, I suspect that this is due to sampling variability, given the lack of evidence (and, indeed, a reversed effect) for the other dominance measures, including the SAT, which showed the original effect.
Testing Hypothesis 2
Notes About Subsequent Analyses
Because the models involved in testing our Hypothesis 2 are moderation models and require probing interactions for effects of the focal predictor at different levels of the moderator, I wanted to find a statistical package that would be able to automate the simple-slopes analysis, becuase it can be quite tedious, especially when working with multiple different models, requiring the creation of many different variables. I was also interested in finding a package that can do the Johnson-Neyman procedure to probe for “regions of significance”—the exact range of the moderator for which the slope of the focal predictor is significant (Johnson & Neyman, 1936). This approach is especially advantageous in cases such as ours where the choice of the exact high and low values of the moderator is relatively arbitrary and this choice may not represent the nature of the interaction at different values than we have chosen. Because our hypothesis actually predicts an interaction effect at high levels of the moderator a priori(Palmer-Hague & Watson, 2016), I needed to find something that would probe the interaction at different levels of the moderator without requiring the interaction in the model to be significant. ‘interactions’ allows for probing of interactions using both simple-slopes analysis and Johnson-Neyman intervals of significance in either general linear models or generalized linear models (Long, 2021). Therefore, in the following section of the analysis we will be using the base R package GLM that fits generalized linear models to fit our binary logistic regression models and assess the model summary statistics, then we will use the ‘interactions’ package to probe interaction effects using both simple-slopes and Johnson-Neyman output.
As we saw in the Assessing Whether Facial Ratings and fWHR Vary by Facial Expression Within Sexes section, our non-neutral facial expression and neutral facial expression groups differ in their aggregate levels of facial rating characteristics. Because of this, when we do analyses with facial dominance as a predictor we will repeat the analysis within a dataframe that only contains the neutral facial expressions to check if the result is still robust.
To simplify the interpretation of our models, each of the predictor variables will be standardized before entering them in as predictors. This will also make the interaction effects more easily interpretable.
Each model will be constructed with only the main effects first, then the interaction effect will be added to determine whether it significantly improves the fit of the model (using a likelihood ratio test).
When predictors significantly add to the model fit, their coefficients will be converted to odds-ratios through exponentiation.
shared_child_sex is coded as female = 0 and male = 1, so in all analyses we are modeling the probability of having a first born son.
We will begin by assessing whether maternal and paternal behavioral dominance (the SAT and the IPIP) interact to predict offspring sex.
SAT
First, we will standardize our independent variables, m_SAT and f_SAT.
# Standardizing the m_SAT and f_SAT variables as z_m_SAT and z_f_SATdyadic_data$z_m_SAT <-scale(dyadic_data$m_SAT, center =TRUE, scale =TRUE)dyadic_data$z_f_SAT <-scale(dyadic_data$f_SAT, center =TRUE, scale =TRUE)
Now we will fit our two models, one with the interaction term and one without.
# Fitting the model with only the main effects of SAT for mothers and fathersSAT_main_effects_model <-glm(shared_child_sex ~ z_m_SAT + z_f_SAT, family =binomial(link = logit), data = dyadic_data)# Fitting the model with the main effects of SAT for mothers and fathers as well as their interactionSAT_interaction_model <-glm(shared_child_sex ~ z_m_SAT + z_f_SAT + z_m_SAT:z_f_SAT, family =binomial(link = logit), data = dyadic_data)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the main effects modelplot(SAT_main_effects_model, which =4, id.n =10)
#Plotting Cook's distance for the interaction modelplot(SAT_interaction_model, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
No excessive multicollinearity: Will assess with VIF and tolerance statistics for the main effects model only because it cannot handle interactions.
# VIF and tolerance for the main effects modelvif(SAT_main_effects_model)
z_m_SAT z_f_SAT
1.044477 1.044477
1/vif(SAT_main_effects_model)
z_m_SAT z_f_SAT
0.9574171 0.9574171
The VIF and tolerance statistics are well within the reasonable range.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, but because the ln of a negative number is undefined we must first transform our predictor variables to all be positive by adding a constant. The below code determines the lowest value of z_m_SAT and z_f_sat so that we can add a constant that makes all values for the variable positive before making the natural log transformation.
min(dyadic_data$z_m_SAT)
[1] -1.08544
min(dyadic_data$z_f_SAT)
[1] -1.125062
We can see that the lowest value for each variable is between -1 and -2, so I will add 2 to each variable then take the natural log of the resulting variable.
# Creating c_z_m_SAT, which represents mothers' SAT scores after adding a constant of 2dyadic_data$c_z_m_SAT <- dyadic_data$z_m_SAT +2# Creating ln_c_z_m_SAT, which represents the natural log of mothers' SAT scores after adding a constant of 2dyadic_data$ln_c_z_m_SAT <-log(dyadic_data$c_z_m_SAT)# Creating c_z_f_SAT, which represents fathers' SAT scores after adding a constant of 2dyadic_data$c_z_f_SAT <- dyadic_data$z_f_SAT +2# Creating ln_c_z_f_SAT, which represents the natural log of fathers' SAT scores after adding a constant of 2dyadic_data$ln_c_z_f_SAT <-log(dyadic_data$c_z_f_SAT)
Through manually doing the transformations for a couple numbers, I believe that this transformation was done correctly.
Now we need to fit and summarize a model with the main effects of each variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that neither of the interaction terms are significant, thereby satisfying the assumption of linearity of the logit.
BT_test_SAT_model <-glm(shared_child_sex ~ c_z_m_SAT + c_z_f_SAT + c_z_m_SAT:ln_c_z_m_SAT + c_z_f_SAT:ln_c_z_f_SAT, family =binomial(link = logit), data = dyadic_data)summary(BT_test_SAT_model)
Call:
glm(formula = shared_child_sex ~ c_z_m_SAT + c_z_f_SAT + c_z_m_SAT:ln_c_z_m_SAT +
c_z_f_SAT:ln_c_z_f_SAT, family = binomial(link = logit),
data = dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.6833 2.6238 1.785 0.0743 .
c_z_m_SAT -4.4486 1.9238 -2.312 0.0208 *
c_z_f_SAT 0.7132 1.6524 0.432 0.6660
c_z_m_SAT:ln_c_z_m_SAT 2.3856 1.0679 2.234 0.0255 *
c_z_f_SAT:ln_c_z_f_SAT -0.3565 0.8972 -0.397 0.6911
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 139.49 on 103 degrees of freedom
Residual deviance: 132.62 on 99 degrees of freedom
AIC: 142.62
Number of Fisher Scoring iterations: 4
Unfortunately, the significant p-value (p = .0256) for the interaction term for mothers’ SAT indicates that the relationship between mothers’ SAT scores and the logit is not linear, which violates our assumption of linearity of the logit. The positive coefficient suggests that as mothers’ SAT increases, it’s effect on the log odds of having a first born son increase in a non-linear fashion.
I will try to do a square root transformation to the standardized mothers’ SAT scores to see if this makes the relationship between the predictor and the logit linear.
# Square root transformation of the standardized mothers' SAT + the constantdyadic_data$sqrt_c_z_m_SAT <-sqrt(dyadic_data$c_z_m_SAT)# Taking the natural log of the square root transformed variable for input into another Box-Tidwell proceduredyadic_data$ln_sqrt_c_z_m_SAT <-log(dyadic_data$sqrt_c_z_m_SAT)# Re-running the box-tidwell procedure with the square-root transformationsqrt_m_SAT_BT_model <-glm(shared_child_sex ~ sqrt_c_z_m_SAT + c_z_f_SAT + sqrt_c_z_m_SAT:ln_sqrt_c_z_m_SAT + c_z_f_SAT:ln_c_z_f_SAT, family =binomial(link = logit), data = dyadic_data)summary(sqrt_m_SAT_BT_model)
Call:
glm(formula = shared_child_sex ~ sqrt_c_z_m_SAT + c_z_f_SAT +
sqrt_c_z_m_SAT:ln_sqrt_c_z_m_SAT + c_z_f_SAT:ln_c_z_f_SAT,
family = binomial(link = logit), data = dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 18.8874 8.1604 2.315 0.0206 *
sqrt_c_z_m_SAT -18.5765 7.9706 -2.331 0.0198 *
c_z_f_SAT 0.6690 1.6497 0.406 0.6851
sqrt_c_z_m_SAT:ln_sqrt_c_z_m_SAT 13.2559 5.8556 2.264 0.0236 *
c_z_f_SAT:ln_c_z_f_SAT -0.3284 0.8944 -0.367 0.7135
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 139.49 on 103 degrees of freedom
Residual deviance: 132.57 on 99 degrees of freedom
AIC: 142.57
Number of Fisher Scoring iterations: 4
The square root transformation did not seem to make a difference, as the p-value is still significant, indicating a non-linear relationship between the predictor and the logit.
I will now try an inverse transformation to see if that makes a difference.
# Taking the inverse of the standardized mothers' SAT + the constantdyadic_data$inv_c_z_m_SAT <-1/(dyadic_data$c_z_m_SAT)# Taking the natural log of the inverse transformed variable for input into another Box-Tidwell proceduredyadic_data$ln_inv_c_z_m_SAT <-log(dyadic_data$inv_c_z_m_SAT)# Re-running the Box-Tidwell procedure with the new transformationsinv_m_SAT_BT_model <-glm(shared_child_sex ~ inv_c_z_m_SAT + c_z_f_SAT + inv_c_z_m_SAT:ln_inv_c_z_m_SAT + c_z_f_SAT:ln_c_z_f_SAT, family =binomial(link = logit), data = dyadic_data)summary(inv_m_SAT_BT_model)
Call:
glm(formula = shared_child_sex ~ inv_c_z_m_SAT + c_z_f_SAT +
inv_c_z_m_SAT:ln_inv_c_z_m_SAT + c_z_f_SAT:ln_c_z_f_SAT,
family = binomial(link = logit), data = dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.2593 2.9230 1.115 0.2648
inv_c_z_m_SAT -2.7209 2.1783 -1.249 0.2116
c_z_f_SAT 0.4155 1.6368 0.254 0.7996
inv_c_z_m_SAT:ln_inv_c_z_m_SAT 7.0060 3.6776 1.905 0.0568 .
c_z_f_SAT:ln_c_z_f_SAT -0.1880 0.8852 -0.212 0.8318
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 139.49 on 103 degrees of freedom
Residual deviance: 133.28 on 99 degrees of freedom
AIC: 143.28
Number of Fisher Scoring iterations: 4
This transformation barely flipped the significance of the p-value for the interaction term.
With these transformations not helping very much, I do not know what else to do. I will therefore finish the analysis using the standardized variables as predictors like we fit in the beginning and avoid over-interpreting confidence intervals or p-values.
Summary of the Models
First, here is the summary of the model with the main effects of mothers’ SAT and fathers’ SAT only, along with the Chi-square test for whether the model with the main effects fits significantly better than the model with only the intercept.
# Producing the summary of the main effects modelsummary(SAT_main_effects_model)
Call:
glm(formula = shared_child_sex ~ z_m_SAT + z_f_SAT, family = binomial(link = logit),
data = dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.43013 0.20082 2.142 0.0322 *
z_m_SAT -0.07761 0.20481 -0.379 0.7047
z_f_SAT 0.02009 0.20642 0.097 0.9225
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 139.49 on 103 degrees of freedom
Residual deviance: 139.34 on 101 degrees of freedom
AIC: 145.34
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_SAT_main_effects_model <- SAT_main_effects_model$null.deviance - SAT_main_effects_model$devianceChi_SAT_main_effects_model
[1] 0.1435668
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_SAT_main_effects_model <- SAT_main_effects_model$df.null - SAT_main_effects_model$df.residualdf_SAT_main_effects_model
[1] 2
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_SAT_main_effects_model <-1-pchisq(Chi_SAT_main_effects_model, df_SAT_main_effects_model)prob_Chi_SAT_main_effects_model
[1] 0.9307325
The model with main effects only seems to be a poor fit for the data, and, although the p-value may be biased due to the violation of the linearity assumption, it is very close to one, indicating that the predictors added virtually nothing to the model with only the intercept included.
Now we will take a quick look at the model with the interaction included, and we will compare it’s fit to the model with only the main effects included.
# Producing the summary of the interaction modelsummary(SAT_interaction_model)
Call:
glm(formula = shared_child_sex ~ z_m_SAT + z_f_SAT + z_m_SAT:z_f_SAT,
family = binomial(link = logit), data = dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.41083 0.20398 2.014 0.044 *
z_m_SAT -0.08591 0.20638 -0.416 0.677
z_f_SAT 0.01025 0.20723 0.049 0.961
z_m_SAT:z_f_SAT 0.09996 0.18963 0.527 0.598
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 139.49 on 103 degrees of freedom
Residual deviance: 139.06 on 100 degrees of freedom
AIC: 147.06
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the interaction model compared to the model with only main effects includedChi_SAT_interaction_model_v_main <- SAT_main_effects_model$deviance - SAT_interaction_model$devianceChi_SAT_interaction_model_v_main
[1] 0.2845532
# Calculating the degrees of freedom for the chi-square statistic comparing the model with the interaction to the main-effects modeldf_SAT_interaction_model_v_main <- SAT_main_effects_model$df.residual - SAT_interaction_model$df.residualdf_SAT_interaction_model_v_main
[1] 1
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_SAT_interaction_model_v_main <-1-pchisq(Chi_SAT_interaction_model_v_main, df_SAT_interaction_model_v_main)prob_Chi_SAT_interaction_model_v_main
[1] 0.5937323
The coefficients for each predictor are very small, and they are not close to significant. Similarly, the model with the interaction included does not fit the data better than the main-effects only model.
We will look for whether the interaction model is significant compared to the baseline with only the intercept.
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_SAT_interaction_model <- SAT_interaction_model$null.deviance - SAT_interaction_model$devianceChi_SAT_interaction_model
[1] 0.42812
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_SAT_interaction_model <- SAT_interaction_model$df.null - SAT_interaction_model$df.residualdf_SAT_interaction_model
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_SAT_interaction_model <-1-pchisq(Chi_SAT_interaction_model, df_SAT_interaction_model)prob_Chi_SAT_interaction_model
[1] 0.9343742
The model is not significant.
Probing for an Interaction
For completion’s sake—although I am almost sure we will not find anything—I figure I may as well take a look at potential differences in slope of the focal predictor (mother’s standardized SAT) at different levels of the moderator (fathers’ standardized SAT). The following is the simple slopes analysis and Johnson-Neyman plot for the interaction model.
# Conducting the simple slopes and Johnson-Neyman analysissim_slopes(SAT_interaction_model, pred = z_m_SAT, modx = z_f_SAT, jnplot =TRUE)
JOHNSON-NEYMAN INTERVAL
The Johnson-Neyman interval could not be found. Is the p value for your
interaction term below the specified alpha?
SIMPLE SLOPES ANALYSIS
Slope of z_m_SAT when z_f_SAT = -1.000000e+00 (- 1 SD):
Est. S.E. z val. p
------- ------ -------- ------
-0.19 0.29 -0.64 0.52
Slope of z_m_SAT when z_f_SAT = -1.494531e-17 (Mean):
Est. S.E. z val. p
------- ------ -------- ------
-0.09 0.21 -0.42 0.68
Slope of z_m_SAT when z_f_SAT = 1.000000e+00 (+ 1 SD):
Est. S.E. z val. p
------ ------ -------- ------
0.01 0.27 0.05 0.96
As suspected, Johnson-Neyman result indicates that there are no values of fathers’ SAT where mothers’ SAT significantly predicts the probability of having a first-born son.
Although it may be that the estimate of the coefficient for mothers’ SAT scores is attenuated due to a truly non-linear relationship such as a growth curve, with the current modeling framework there does not seem to be evidence that mothers’ SAT positively predicts the probability of having a first born child when fathers’ SAT score is high.
IPIP Dominance
First, we will standardize our independent variables, m_IPIP_dom and f_IPIP_dom.
# Standardizing the m_IPIP_dom and f_IPIP_dom variables as z_m_IPIP_dom and z_f_IPIP_domdyadic_data$z_m_IPIP_dom <-scale(dyadic_data$m_IPIP_dom, center =TRUE, scale =TRUE)dyadic_data$z_f_IPIP_dom <-scale(dyadic_data$f_IPIP_dom, center =TRUE, scale =TRUE)
Now we will fit our two models, one with the interaction term and one without.
# Fitting the model with only the main effects of IPIP dominance for mothers and fathersIPIP_main_effects_model <-glm(shared_child_sex ~ z_m_IPIP_dom + z_f_IPIP_dom, family =binomial(link = logit), data = dyadic_data)# Fitting the model with the main effects of IPIP dominance for mothers and fathers as well as their interactionIPIP_interaction_model <-glm(shared_child_sex ~ z_m_IPIP_dom + z_f_IPIP_dom + z_m_IPIP_dom:z_f_IPIP_dom, family =binomial(link = logit), data = dyadic_data)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the main effects modelplot(IPIP_main_effects_model, which =4, id.n =10)
#Plotting Cook's distance for the interaction modelplot(IPIP_interaction_model, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
No excessive multicollinearity: Will assess with VIF and tolerance statistics for the main effects model only because it cannot handle interactions.
# VIF and tolerance for the main effects modelvif(IPIP_main_effects_model)
z_m_IPIP_dom z_f_IPIP_dom
1.063057 1.063057
1/vif(IPIP_main_effects_model)
z_m_IPIP_dom z_f_IPIP_dom
0.9406837 0.9406837
The VIF and tolerance statistics are well within the reasonable range.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, and in this case I am pretty sure that the raw IPIP variable does not contain zero or negative numbers. I will check below by looking at the minimum value for the raw IPIP dominance measure for both mothers and fathers and if this is the case I will simply do the log transformation to this variable for use in the Box-Tidwell procedure.
# Checking the minimum value for IPIP dominance for mothers and fathersmin(dyadic_data$m_IPIP_dom, na.rm =TRUE)
[1] 11
min(dyadic_data$f_IPIP_dom, na.rm =TRUE)
[1] 14
Indeed, the mimimum value for the raw variables is positive for both, so I will do the natural log transformation of each of these variables for input as interaction terms to complete the Box-Tidwell procedure.
# Creating ln_m_IPIP_dom, which represents the natural log of mothers' IPIP dominance scoresdyadic_data$ln_m_IPIP_dom <-log(dyadic_data$m_IPIP_dom)# Creating ln_f_IPIP_dom, which represents the natural log of fathers' IPIP dominance scoresdyadic_data$ln_f_IPIP_dom <-log(dyadic_data$f_IPIP_dom)
Through manually doing the transformations for a couple numbers, I believe that this transformation was done correctly.
Now we need to fit and summarize a model with the main effects of each variable plus the interaction of those variables with their natural log transformations. The summary should indicate that neither of the interaction terms are significant, thereby satisfying the assumption of linearity of the logit.
# Fitting the model for the BT procedureBT_test_IPIP_model <-glm(shared_child_sex ~ m_IPIP_dom + f_IPIP_dom + m_IPIP_dom:ln_m_IPIP_dom + f_IPIP_dom:ln_f_IPIP_dom, family =binomial(link = logit), data = dyadic_data)# Displaying the resultssummary(BT_test_IPIP_model)
Call:
glm(formula = shared_child_sex ~ m_IPIP_dom + f_IPIP_dom + m_IPIP_dom:ln_m_IPIP_dom +
f_IPIP_dom:ln_f_IPIP_dom, family = binomial(link = logit),
data = dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 12.0595 6.7964 1.774 0.076 .
m_IPIP_dom -0.8261 0.6574 -1.257 0.209
f_IPIP_dom -1.0217 0.8494 -1.203 0.229
m_IPIP_dom:ln_m_IPIP_dom 0.1980 0.1544 1.282 0.200
f_IPIP_dom:ln_f_IPIP_dom 0.2281 0.1939 1.176 0.239
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 139.49 on 103 degrees of freedom
Residual deviance: 134.84 on 99 degrees of freedom
AIC: 144.84
Number of Fisher Scoring iterations: 4
Neither of the interaction terms are significant, indicating that we should not reject the null hypothesis that there is a linear relationship between our continuous predictors and the logit.
Summary of the Models
First, here is the summary of the model with the main effects of mothers’ IPIP dominance and fathers’ IPIP dominance only, along with a Chi-square test for whether the model with the main effects fits significantly better than the model with only the intercept.
# Producing the summary of the main effects modelsummary(IPIP_main_effects_model)
Call:
glm(formula = shared_child_sex ~ z_m_IPIP_dom + z_f_IPIP_dom,
family = binomial(link = logit), data = dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.4357 0.2022 2.155 0.0312 *
z_m_IPIP_dom 0.1669 0.2114 0.789 0.4298
z_f_IPIP_dom -0.2094 0.2107 -0.994 0.3202
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 139.49 on 103 degrees of freedom
Residual deviance: 138.14 on 101 degrees of freedom
AIC: 144.14
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_IPIP_main_effects_model <- IPIP_main_effects_model$null.deviance - IPIP_main_effects_model$devianceChi_IPIP_main_effects_model
[1] 1.340487
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_IPIP_main_effects_model <- IPIP_main_effects_model$df.null - IPIP_main_effects_model$df.residualdf_IPIP_main_effects_model
[1] 2
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_IPIP_main_effects_model <-1-pchisq(Chi_IPIP_main_effects_model, df_IPIP_main_effects_model)prob_Chi_IPIP_main_effects_model
[1] 0.511584
Looking at the summary of the model, we can see that neither predictor is significant at the .05 level, and the Chi square comparison to test whether the main effects of mother and father IPIP dominance improve the fit of the model is also not significant (χ²(2) = 1.340, p = .512).
Now we will summarize the model with the main effects and their interaction, and we will compare it’s fit to the intercept only model. After this we will compare it to the main effects model.
# Summarizing the interaction modelsummary(IPIP_interaction_model)
Call:
glm(formula = shared_child_sex ~ z_m_IPIP_dom + z_f_IPIP_dom +
z_m_IPIP_dom:z_f_IPIP_dom, family = binomial(link = logit),
data = dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.43231 0.20785 2.080 0.0375 *
z_m_IPIP_dom 0.16751 0.21142 0.792 0.4282
z_f_IPIP_dom -0.21086 0.21158 -0.997 0.3190
z_m_IPIP_dom:z_f_IPIP_dom 0.01492 0.21056 0.071 0.9435
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 139.49 on 103 degrees of freedom
Residual deviance: 138.14 on 100 degrees of freedom
AIC: 146.14
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_IPIP_interaction_model <- IPIP_interaction_model$null.deviance - IPIP_interaction_model$devianceChi_IPIP_interaction_model
[1] 1.34551
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_IPIP_interaction_model <- IPIP_interaction_model$df.null - IPIP_interaction_model$df.residualdf_IPIP_interaction_model
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_IPIP_interaction_model <-1-pchisq(Chi_IPIP_interaction_model, df_IPIP_interaction_model)prob_Chi_IPIP_interaction_model
[1] 0.7183556
The model summary indicates that none of the predictors are significant, including the interaction term. In addition, the model does not fit the data better than the intercept-only version of the model.
Looking at the summary of the interaction model, it would be very surprising that the interaction model would fit any better than the main-effects model, but I will go ahead and do it anyways.
# Calculating the chi-square statistic to compare the interaction model with the main effects modelChi_IPIP_interaction_model_v_main <- IPIP_main_effects_model$deviance - IPIP_interaction_model$devianceChi_IPIP_interaction_model_v_main
[1] 0.005023363
# Calculating the degrees of freedom to compare the interaction model with the main effects modeldf_IPIP_interaction_model_v_main <- IPIP_main_effects_model$df.residual - IPIP_interaction_model$df.residualdf_IPIP_interaction_model_v_main
As suspected, the Chi-squared test does not indicate that the interaction model fits better than the main effects model (χ²(1) = .005, p = .943).
Again, for completion’s sake, I will run the simple slopes and Johnson-Neyman analysis for the interaction model.
# Conducting the simple slopes and Johnson-Neyman analysissim_slopes(IPIP_interaction_model, pred = z_m_IPIP_dom, modx = z_f_IPIP_dom, jnplot =TRUE)
JOHNSON-NEYMAN INTERVAL
The Johnson-Neyman interval could not be found. Is the p value for your
interaction term below the specified alpha?
SIMPLE SLOPES ANALYSIS
Slope of z_m_IPIP_dom when z_f_IPIP_dom = -1.00 (- 1 SD):
Est. S.E. z val. p
------ ------ -------- ------
0.15 0.29 0.52 0.60
Slope of z_m_IPIP_dom when z_f_IPIP_dom = 0.00 (Mean):
Est. S.E. z val. p
------ ------ -------- ------
0.17 0.21 0.79 0.43
Slope of z_m_IPIP_dom when z_f_IPIP_dom = 1.00 (+ 1 SD):
Est. S.E. z val. p
------ ------ -------- ------
0.18 0.30 0.60 0.55
As suspected, Johnson-Neyman result indicates that there are no values of fathers’ SAT where mothers’ SAT significantly predicts the probability of having a first-born son.
Hypothesis 2: Dominance Status Measure Operationalization
Now we will test the hypothesis with self-reported dominance status (Cheng et al., 2010).
Self-Reported Dominance Status
First we will standardize our independent variables, m_SR_dom_cheng and f_SR_dom_cheng.
# Standardizing the m_SR_dom_cheng and f_SR_dom_cheng variables as z_m_SR_dom_cheng and z_f_SR_dom_chengdyadic_data$z_m_SR_dom_cheng <-scale(dyadic_data$m_SR_dom_cheng, center =TRUE, scale =TRUE)dyadic_data$z_f_SR_dom_cheng <-scale(dyadic_data$f_SR_dom_cheng, center =TRUE, scale =TRUE)
Now we will fit our two models, one with the interaction term and one without.
# Fitting the model with only the main effects of IPIP dominance for mothers and fatherssr_cheng_main_effects_model <-glm(shared_child_sex ~ z_m_SR_dom_cheng + z_f_SR_dom_cheng, family =binomial(link = logit), data = dyadic_data)# Fitting the model with the main effects of IPIP dominance for mothers and fathers as well as their interactionsr_cheng_interaction_model <-glm(shared_child_sex ~ z_m_SR_dom_cheng + z_f_SR_dom_cheng + z_m_SR_dom_cheng:z_f_SR_dom_cheng, family =binomial(link = logit), data = dyadic_data)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the main effects modelplot(sr_cheng_main_effects_model, which =4, id.n =10)
#Plotting Cook's distance for the interaction modelplot(sr_cheng_interaction_model, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
No excessive multicollinearity: Will assess with VIF and tolerance statistics for the main effects model only because it cannot handle interactions.
# VIF and tolerance for the main effects modelvif(sr_cheng_main_effects_model)
The VIF and tolerance statistics are well within the reasonable range.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, and in this case I am pretty sure that the raw SR dominance variable does not contain zero or negative numbers. I will check below by looking at the minimum value for the raw SR dominance measure for both mothers and fathers and if this is the case I will simply do the log transformation to this variable for use in the Box-Tidwell procedure.
# Checking the minimum value for IPIP dominance for mothers and fathersmin(dyadic_data$m_SR_dom_cheng, na.rm =TRUE)
[1] 1
min(dyadic_data$f_SR_dom_cheng, na.rm =TRUE)
[1] 1.125
Indeed, the mimimum value for the raw variables is positive for both, so I will do the natural log transformation of each of these variables for input as interaction terms to complete the Box-Tidwell procedure.
# Creating ln_m_SR_dom_cheng, which represents the natural log of mothers' SR dominance scoresdyadic_data$ln_m_SR_dom_cheng <-log(dyadic_data$m_SR_dom_cheng)# Creating ln_f_SR_dom_cheng, which represents the natural log of fathers' SR dominance scoresdyadic_data$ln_f_SR_dom_cheng <-log(dyadic_data$f_SR_dom_cheng)
Through manually doing the transformations for a couple numbers, I believe that this transformation was done correctly.
Now we need to fit and summarize a model with the main effects of each variable plus the interaction of those variables with their natural log transformations. The summary should indicate that neither of the interaction terms are significant, thereby satisfying the assumption of linearity of the logit.
# Fitting the model for the BT procedureBT_test_SR_cheng_model <-glm(shared_child_sex ~ m_SR_dom_cheng + f_SR_dom_cheng + m_SR_dom_cheng:ln_m_SR_dom_cheng + f_SR_dom_cheng:ln_f_SR_dom_cheng, family =binomial(link = logit), data = dyadic_data)# Displaying the resultssummary(BT_test_SR_cheng_model)
Call:
glm(formula = shared_child_sex ~ m_SR_dom_cheng + f_SR_dom_cheng +
m_SR_dom_cheng:ln_m_SR_dom_cheng + f_SR_dom_cheng:ln_f_SR_dom_cheng,
family = binomial(link = logit), data = dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.3412 3.5538 -0.096 0.924
m_SR_dom_cheng -0.1285 1.5976 -0.080 0.936
f_SR_dom_cheng 0.5728 1.9817 0.289 0.773
m_SR_dom_cheng:ln_m_SR_dom_cheng 0.1847 0.7715 0.239 0.811
f_SR_dom_cheng:ln_f_SR_dom_cheng -0.3223 0.9223 -0.349 0.727
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 139.49 on 103 degrees of freedom
Residual deviance: 138.05 on 99 degrees of freedom
AIC: 148.05
Number of Fisher Scoring iterations: 4
Neither of the interaction terms are significant, indicating that we should not reject the null hypothesis that there is a linear relationship between our continuous predictors and the logit.
Summary of the Models
First, here is the summary of the model with the main effects of mothers’ SR dominance status and fathers’ SR dominance status only, along with a Chi-square test for whether the model with the main effects fits significantly better than the model with only the intercept.
# Producing the summary of the main effects modelsummary(sr_cheng_main_effects_model)
Call:
glm(formula = shared_child_sex ~ z_m_SR_dom_cheng + z_f_SR_dom_cheng,
family = binomial(link = logit), data = dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.4356 0.2022 2.155 0.0312 *
z_m_SR_dom_cheng 0.2541 0.2321 1.095 0.2735
z_f_SR_dom_cheng -0.1218 0.2235 -0.545 0.5857
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 139.49 on 103 degrees of freedom
Residual deviance: 138.23 on 101 degrees of freedom
AIC: 144.23
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_sr_cheng_main_effects_model <- sr_cheng_main_effects_model$null.deviance - sr_cheng_main_effects_model$devianceChi_sr_cheng_main_effects_model
[1] 1.257427
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_sr_cheng_main_effects_model <- sr_cheng_main_effects_model$df.null - sr_cheng_main_effects_model$df.residualdf_sr_cheng_main_effects_model
[1] 2
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_sr_cheng_main_effects_model <-1-pchisq(Chi_sr_cheng_main_effects_model, df_sr_cheng_main_effects_model)prob_Chi_sr_cheng_main_effects_model
[1] 0.5332774
Looking at the summary of the model, we can see that neither predictor is significant at the .05 level, and the Chi square comparison to test whether the main effects of mother and father SR dominance status improve the fit of the model is also not significant (χ²(2) = 1.257, p = .533).
Now we will summarize the model with the main effects and their interaction, and we will compare it’s fit to the intercept only model. After this we will compare it to the main effects model.
# Summarizing the interaction modelsummary(sr_cheng_interaction_model)
Call:
glm(formula = shared_child_sex ~ z_m_SR_dom_cheng + z_f_SR_dom_cheng +
z_m_SR_dom_cheng:z_f_SR_dom_cheng, family = binomial(link = logit),
data = dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.5510 0.2297 2.399 0.0164 *
z_m_SR_dom_cheng 0.3576 0.2548 1.403 0.1605
z_f_SR_dom_cheng -0.1062 0.2280 -0.466 0.6412
z_m_SR_dom_cheng:z_f_SR_dom_cheng -0.2644 0.2323 -1.138 0.2552
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 139.49 on 103 degrees of freedom
Residual deviance: 136.91 on 100 degrees of freedom
AIC: 144.91
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_sr_cheng_interaction_model <- sr_cheng_interaction_model$null.deviance - sr_cheng_interaction_model$devianceChi_sr_cheng_interaction_model
[1] 2.572065
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_sr_cheng_interaction_model <- sr_cheng_interaction_model$df.null - sr_cheng_interaction_model$df.residualdf_sr_cheng_interaction_model
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_sr_cheng_interaction_model <-1-pchisq(Chi_sr_cheng_interaction_model, df_sr_cheng_interaction_model)prob_Chi_sr_cheng_interaction_model
[1] 0.462408
The model summary indicates that none of the predictors are significant, including the interaction term. In addition, the model does not fit the data better than the intercept-only version of the model.
Looking at the estimate and p-value for the interaction term, it would be very unlikely that the interaction model would fit any better than the main-effects model, but I will go ahead and do the comparison anyways.
# Calculating the chi-square statistic to compare the interaction model with the main effects modelChi_sr_cheng_interaction_model_v_main <- sr_cheng_main_effects_model$deviance - sr_cheng_interaction_model$devianceChi_sr_cheng_interaction_model_v_main
[1] 1.314638
# Calculating the degrees of freedom to compare the interaction model with the main effects modeldf_sr_cheng_interaction_model_v_main <- sr_cheng_main_effects_model$df.residual - sr_cheng_interaction_model$df.residualdf_sr_cheng_interaction_model_v_main
As suspected, the Chi-squared test does not indicate that the interaction model fits better than the main effects model (χ²(1) = 1.315, p = .252).
Again, for completion’s sake, I will run the simple slopes and Johnson-Neyman analysis for the interaction model.
# Conducting the simple slopes and Johnson-Neyman analysissim_slopes(sr_cheng_interaction_model, pred = z_m_SR_dom_cheng, modx = z_f_SR_dom_cheng, jnplot =TRUE)
JOHNSON-NEYMAN INTERVAL
The Johnson-Neyman interval could not be found. Is the p value for your
interaction term below the specified alpha?
SIMPLE SLOPES ANALYSIS
Slope of z_m_SR_dom_cheng when z_f_SR_dom_cheng = -1.000000e+00 (- 1 SD):
Est. S.E. z val. p
------ ------ -------- ------
0.62 0.41 1.53 0.13
Slope of z_m_SR_dom_cheng when z_f_SR_dom_cheng = -1.561251e-17 (Mean):
Est. S.E. z val. p
------ ------ -------- ------
0.36 0.25 1.40 0.16
Slope of z_m_SR_dom_cheng when z_f_SR_dom_cheng = 1.000000e+00 (+ 1 SD):
Est. S.E. z val. p
------ ------ -------- ------
0.09 0.27 0.34 0.73
As suspected, Johnson-Neyman result indicates that there are no values of fathers’ SR dominance status where mothers’ SR dominance significantly predicts the probability of having a first-born son.
Hypothesis 2: Facial Dominance Operationalization
Now we will test the hypothesis with the standardized residuals for facial dominance that we created within sexes. Because there are group differences in facial dominance between the neutral and non-neutral facial expression groups, we will conduct this analysis on both the full dyadic dataset and on the neutral faces only.
Full Dataset
First we will standardize our independent variables, m_res_facial_dominance and f_res_facial_dominance.
# Standardizing the m_res_facial_dominance and f_res_facial_dominance variables as z_m_res_facial_dominance and z_f_res_facial_dominancedyadic_data$z_m_res_facial_dominance <-scale(dyadic_data$m_res_facial_dominance, center =TRUE, scale =TRUE)dyadic_data$z_f_res_facial_dominance <-scale(dyadic_data$f_res_facial_dominance, center =TRUE, scale =TRUE)
Now we will fit our two models, one with the interaction term and one without.
# Fitting the model with only the main effects of residual facial dominance for mothers and fathersres_fac_dom_main_effects_model <-glm(shared_child_sex ~ z_m_res_facial_dominance + z_f_res_facial_dominance, family =binomial(link = logit), data = dyadic_data)# Fitting the model with the main effects of residual facial dominance for mothers and fathers as well as their interactionres_fac_dom_interaction_model <-glm(shared_child_sex ~ z_m_res_facial_dominance + z_f_res_facial_dominance + z_m_res_facial_dominance:z_f_res_facial_dominance, family =binomial(link = logit), data = dyadic_data)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the main effects modelplot(res_fac_dom_main_effects_model, which =4, id.n =10)
#Plotting Cook's distance for the interaction modelplot(res_fac_dom_interaction_model, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
No excessive multicollinearity: Will assess with VIF and tolerance statistics for the main effects model only because it cannot handle interactions.
# VIF and tolerance for the main effects modelvif(res_fac_dom_main_effects_model)
The VIF and tolerance statistics are well within the reasonable range.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, but because the ln of a negative number is undefined we must first transform our predictor variables to all be positive by adding a constant. The below code determines the lowest value of z_m_res_facial_dominance and z_f_res_facial_dominance so that we can add a constant that makes all values for the variable positive before making the natural log transformation.
min(dyadic_data$z_m_res_facial_dominance)
[1] -2.439434
min(dyadic_data$z_f_res_facial_dominance, na.rm =TRUE) # Because there is one NA value in the father data we specify that the algorithm should ignore this value
[1] -3.361094
Given these minimum values, I will add 4 to each variable to make them positive, then I will make a natural log transformation to both.
# Creating c_z_m_res_facial_dominance, which represents mothers' residual facial dominance after adding a constant of 4dyadic_data$c_z_m_res_facial_dominance <- dyadic_data$z_m_res_facial_dominance +4# Creating ln_c_z_m_res_facial_dominance, which represents the natural log of mothers' residual facial dominance after adding a constant of 4dyadic_data$ln_c_z_m_res_facial_dominance <-log(dyadic_data$c_z_m_res_facial_dominance)# Creating c_z_f_res_facial_dominance, which represents fathers' residual facial dominance after adding a constant of 4dyadic_data$c_z_f_res_facial_dominance <- dyadic_data$z_f_res_facial_dominance +4# Creating ln_c_z_f_res_facial_dominance, which represents the natural log of fathers' residual facial dominance after adding a constant of 4dyadic_data$ln_c_z_f_res_facial_dominance <-log(dyadic_data$c_z_f_res_facial_dominance)
Through manually doing the transformations for a couple numbers, I believe that this transformation was done correctly.
Now we need to fit and summarize a model with the main effects of each variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that neither of the interaction terms are significant, thereby satisfying the assumption of linearity of the logit.
BT_test_res_facial_dom_model <-glm(shared_child_sex ~ c_z_m_res_facial_dominance + c_z_f_res_facial_dominance + c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance + c_z_f_res_facial_dominance:ln_c_z_f_res_facial_dominance, family =binomial(link = logit), data = dyadic_data)summary(BT_test_res_facial_dom_model)
Call:
glm(formula = shared_child_sex ~ c_z_m_res_facial_dominance +
c_z_f_res_facial_dominance + c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance +
c_z_f_res_facial_dominance:ln_c_z_f_res_facial_dominance,
family = binomial(link = logit), data = dyadic_data)
Coefficients:
Estimate Std. Error
(Intercept) -0.2428 6.4165
c_z_m_res_facial_dominance 2.9280 3.0059
c_z_f_res_facial_dominance -3.3793 2.5450
c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance -1.2074 1.2521
c_z_f_res_facial_dominance:ln_c_z_f_res_facial_dominance 1.6497 1.1157
z value Pr(>|z|)
(Intercept) -0.038 0.970
c_z_m_res_facial_dominance 0.974 0.330
c_z_f_res_facial_dominance -1.328 0.184
c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance -0.964 0.335
c_z_f_res_facial_dominance:ln_c_z_f_res_facial_dominance 1.479 0.139
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 138.48 on 102 degrees of freedom
Residual deviance: 130.80 on 98 degrees of freedom
(1 observation deleted due to missingness)
AIC: 140.8
Number of Fisher Scoring iterations: 4
Neither of the interaction terms are significant, indicating that we should not reject the null hypothesis that there is a linear relationship between our continuous predictors and the logit.
Summary of the Models
First, here is the summary of the model with the main effects of mothers’ residual facial dominance and fathers’ residual facial dominance only, along with a Chi-square test for whether the model with the main effects fits significantly better than the model with only the intercept.
# Producing the summary of the main effects modelsummary(res_fac_dom_main_effects_model)
Call:
glm(formula = shared_child_sex ~ z_m_res_facial_dominance + z_f_res_facial_dominance,
family = binomial(link = logit), data = dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.42678 0.20505 2.081 0.0374 *
z_m_res_facial_dominance 0.02287 0.20489 0.112 0.9111
z_f_res_facial_dominance 0.37699 0.21672 1.739 0.0820 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 138.48 on 102 degrees of freedom
Residual deviance: 135.15 on 100 degrees of freedom
(1 observation deleted due to missingness)
AIC: 141.15
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_res_fac_dom_main_effects_model <- res_fac_dom_main_effects_model$null.deviance - res_fac_dom_main_effects_model$devianceChi_res_fac_dom_main_effects_model
[1] 3.3311
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_res_fac_dom_main_effects_model <- res_fac_dom_main_effects_model$df.null - res_fac_dom_main_effects_model$df.residualdf_res_fac_dom_main_effects_model
[1] 2
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_res_fac_dom_main_effects_model <-1-pchisq(Chi_res_fac_dom_main_effects_model, df_res_fac_dom_main_effects_model)prob_Chi_res_fac_dom_main_effects_model
[1] 0.1890866
Although the whole model is not a better fit than the baseline model with just the intercept (x2(2) = 3.331, p = .189), the residual facial dominance predictor for fathers was close to significant (b = .377, z = 1.739, p = .082), with a one standard deviation increase in father’s residual facial dominance leading to a .377 unit increase in the log-odds of having a first born son. I will exponentiate this coefficient to make it more interpretable.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(res_fac_dom_main_effects_model$coefficients)
The odds ratio for father’s residual facial dominance (although not significant) is OR = 1.457, indicating that while holding mother’s residual facial dominance constant a one standard deviation increase in father’s residual facial dominance is associated with a 45.7% higher odds of having a first born son.
Now we will summarize the model with the main effects and their interaction, and we will compare it’s fit to the intercept only model. After this we will compare it to the main effects model.
# Summarizing the interaction modelsummary(res_fac_dom_interaction_model)
Call:
glm(formula = shared_child_sex ~ z_m_res_facial_dominance + z_f_res_facial_dominance +
z_m_res_facial_dominance:z_f_res_facial_dominance, family = binomial(link = logit),
data = dyadic_data)
Coefficients:
Estimate Std. Error z value
(Intercept) 0.39794 0.20835 1.910
z_m_res_facial_dominance 0.09678 0.21849 0.443
z_f_res_facial_dominance 0.45865 0.23069 1.988
z_m_res_facial_dominance:z_f_res_facial_dominance 0.40692 0.26321 1.546
Pr(>|z|)
(Intercept) 0.0561 .
z_m_res_facial_dominance 0.6578
z_f_res_facial_dominance 0.0468 *
z_m_res_facial_dominance:z_f_res_facial_dominance 0.1221
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 138.48 on 102 degrees of freedom
Residual deviance: 132.53 on 99 degrees of freedom
(1 observation deleted due to missingness)
AIC: 140.53
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_res_fac_dom_interaction_model <- res_fac_dom_interaction_model$null.deviance - res_fac_dom_interaction_model$devianceChi_res_fac_dom_interaction_model
[1] 5.944714
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_res_fac_dom_interaction_model <- res_fac_dom_interaction_model$df.null - res_fac_dom_interaction_model$df.residualdf_res_fac_dom_interaction_model
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_res_fac_dom_interaction_model <-1-pchisq(Chi_res_fac_dom_interaction_model, df_res_fac_dom_interaction_model)prob_Chi_res_fac_dom_interaction_model
[1] 0.1143312
Interestingly—although the model still does not fit significantly better than the intercept-only model (x2(3) = 5.945, p = .114)—when the interaction between mothers’ and fathers’ residual facial dominance is added to the model fathers’ residual facial dominance becomes significance (b = .459, z = 1.988, p = .047). Also of note, but not significant, the interaction between mothers’ and fathers’ residual facial dominance has a positive coefficient (b = .407, z = 1.546, p = .122). I will exponentiate these coefficients to make them more interpretable.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(res_fac_dom_interaction_model$coefficients)
For father’s residual facial dominance, the OR = 1.582, and for the interaction between mothers’ and fathers’ residual facial dominance the OR = 1.502. This indicates that, with other variables held constant, a one standard deviation unit increase in fathers’ residual facial dominance is associated with a 58.2% increase in the odds of having a first born son, and a one unit increase in the product of mothers’ and fathers’ residual facial dominance is associated with a 50.2% increase in the odds of having a first born son.
Because the whole interaction model is not significant compared to the intercept-only model, it would be very surprising if the interaction model would fit better than the main-effects only model, but I will go ahead and do the comparison anyways.
# Calculating the chi-square statistic to compare the interaction model with the main effects modelChi_res_fac_dom_interaction_model_v_main <- res_fac_dom_main_effects_model$deviance - res_fac_dom_interaction_model$devianceChi_res_fac_dom_interaction_model_v_main
[1] 2.613613
# Calculating the degrees of freedom to compare the interaction model with the main effects modeldf_res_fac_dom_interaction_model_v_main <- res_fac_dom_main_effects_model$df.residual - res_fac_dom_interaction_model$df.residualdf_res_fac_dom_interaction_model_v_main
As suspected, the Chi-squared test does not indicate that the interaction model fits better than the main effects model (x2(1) = 2.614, p = .106).
Although the interaction is not significant in our model, our hypothesis predicts that at high levels of fathers’ facial dominance mothers’ dominance predicts the probability of a first born son, so I will run a simple slopes and Johnson-Neyman analysis with mothers’ residual facial dominance as the focal predictor and fathers’ residual facial dominance as the moderator.
# Conducting the simple slopes and Johnson-Neyman analysissim_slopes(res_fac_dom_interaction_model, pred = z_m_res_facial_dominance, modx = z_f_res_facial_dominance, jnplot =TRUE)
JOHNSON-NEYMAN INTERVAL
The Johnson-Neyman interval could not be found. Is the p value for your
interaction term below the specified alpha?
SIMPLE SLOPES ANALYSIS
Slope of z_m_res_facial_dominance when z_f_res_facial_dominance = -1.000000e+00 (- 1 SD):
Est. S.E. z val. p
------- ------ -------- ------
-0.31 0.30 -1.03 0.30
Slope of z_m_res_facial_dominance when z_f_res_facial_dominance = 1.293464e-17 (Mean):
Est. S.E. z val. p
------ ------ -------- ------
0.10 0.22 0.44 0.66
Slope of z_m_res_facial_dominance when z_f_res_facial_dominance = 1.000000e+00 (+ 1 SD):
Est. S.E. z val. p
------ ------ -------- ------
0.50 0.38 1.33 0.18
Although nothing here is significant, I will exponentiate the coefficients here to make them more interpretable.
# Producing odds-ratios for each value of the slope for mothers' residual facial dominance for each level of the moderator in the simple slopes analysisexp(-.30) # -1 SD
[1] 0.7408182
exp(.09) # Mean
[1] 1.094174
exp(.49) # +1 SD
[1] 1.632316
# Taking the inverse of the odds-ratio for the first coefficient1-0.7408182
[1] 0.2591818
The Johnson-Neyman analysis indicates that there are no values of fathers’ residual facial dominance for which mothers’ residual facial dominance is a significant predictor of the probability of having a first born child. Worthy of note, however, the simple slopes analysis indicates that at low levels of fathers’ residual facial dominance (-1 SD) a one standard deviation increase in mothers’ residual facial dominance is associated with a 25.9% decrease in the odds of having a first born son and at high levels of fathers’ residual facial dominance (+1 SD) a one standard deviation increase in mothers’ residual facial dominance is associated with a 63.2% increase in the odds of having a first born son.
Although not significant, the direction of this moderation effect is consistent with our hypothesis.
It is possible that in the population there exists such an effect, whereas we simply do not have enough power to detect it.
Neutral Faces Only
Now we will repeat the same analysis as just above but with the neutral faces only. To do this, I will create a new data frame that only contains the neutral faces.
# Creating a new dyadic data frame called neutral_face_dyadic_data where only cases where mothers' and fathers' face was neutral in the photograph are includedneutral_face_dyadic_data <-subset(dyadic_data, m_expression_not_neutral =="neutral"& f_expression_not_neutral =="neutral")
I have checked the new data frame and it seems to have been created correctly, resulting in n = 87 cases.
Now we will standardize our independent variables in the new data frame, m_res_facial_dominance and f_res_facial_dominance.
# Standardizing the m_res_facial_dominance and f_res_facial_dominance variables as z_m_res_facial_dominance and z_f_res_facial_dominance (because the mean and SD may be slightly different in this new data frame)neutral_face_dyadic_data$z_m_res_facial_dominance <-scale(neutral_face_dyadic_data$m_res_facial_dominance, center =TRUE, scale =TRUE)neutral_face_dyadic_data$z_f_res_facial_dominance <-scale(neutral_face_dyadic_data$f_res_facial_dominance, center =TRUE, scale =TRUE)
Now we will fit our two models, one with the interaction term and one without.
# Fitting the model with only the main effects of residual facial dominance for mothers and fathersres_fac_dom_main_effects_model_neutral <-glm(shared_child_sex ~ z_m_res_facial_dominance + z_f_res_facial_dominance, family =binomial(link = logit), data = neutral_face_dyadic_data)# Fitting the model with the main effects of residual facial dominance for mothers and fathers as well as their interactionres_fac_dom_interaction_model_neutral <-glm(shared_child_sex ~ z_m_res_facial_dominance + z_f_res_facial_dominance + z_m_res_facial_dominance:z_f_res_facial_dominance, family =binomial(link = logit), data = neutral_face_dyadic_data)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the main effects modelplot(res_fac_dom_main_effects_model_neutral, which =4, id.n =10)
#Plotting Cook's distance for the interaction modelplot(res_fac_dom_interaction_model_neutral, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
No excessive multicollinearity: Will assess with VIF and tolerance statistics for the main effects model only because it cannot handle interactions.
# VIF and tolerance for the main effects modelvif(res_fac_dom_main_effects_model_neutral)
The VIF and tolerance statistics are well within the reasonable range.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, but because the ln of a negative number is undefined we must first transform our predictor variables to all be positive by adding a constant. The below code determines the lowest value of z_m_res_facial_dominance and z_f_res_facial_dominance so that we can add a constant that makes all values for the variable positive before making the natural log transformation.
Given these minimum values, I will add 3 to each variable to make them positive, then I will make a natural log transformation to both.
# Creating c_z_m_res_facial_dominance, which represents mothers' residual facial dominance after adding a constant of 3neutral_face_dyadic_data$c_z_m_res_facial_dominance <- neutral_face_dyadic_data$z_m_res_facial_dominance +3# Creating ln_c_z_m_res_facial_dominance, which represents the natural log of mothers' residual facial dominance after adding a constant of 3neutral_face_dyadic_data$ln_c_z_m_res_facial_dominance <-log(neutral_face_dyadic_data$c_z_m_res_facial_dominance)# Creating c_z_f_res_facial_dominance, which represents fathers' residual facial dominance after adding a constant of 3neutral_face_dyadic_data$c_z_f_res_facial_dominance <- neutral_face_dyadic_data$z_f_res_facial_dominance +3# Creating ln_c_z_f_res_facial_dominance, which represents the natural log of fathers' residual facial dominance after adding a constant of 3neutral_face_dyadic_data$ln_c_z_f_res_facial_dominance <-log(neutral_face_dyadic_data$c_z_f_res_facial_dominance)
Now we need to fit and summarize a model with the main effects of each variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that neither of the interaction terms are significant, thereby satisfying the assumption of linearity of the logit.
BT_test_res_facial_dom_model_neutral <-glm(shared_child_sex ~ c_z_m_res_facial_dominance + c_z_f_res_facial_dominance + c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance + c_z_f_res_facial_dominance:ln_c_z_f_res_facial_dominance, family =binomial(link = logit), data = neutral_face_dyadic_data)summary(BT_test_res_facial_dom_model_neutral)
Call:
glm(formula = shared_child_sex ~ c_z_m_res_facial_dominance +
c_z_f_res_facial_dominance + c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance +
c_z_f_res_facial_dominance:ln_c_z_f_res_facial_dominance,
family = binomial(link = logit), data = neutral_face_dyadic_data)
Coefficients:
Estimate Std. Error
(Intercept) -2.9930 3.6877
c_z_m_res_facial_dominance 1.3959 2.2373
c_z_f_res_facial_dominance 0.1359 1.9328
c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance -0.5843 1.0558
c_z_f_res_facial_dominance:ln_c_z_f_res_facial_dominance 0.2251 0.9547
z value Pr(>|z|)
(Intercept) -0.812 0.417
c_z_m_res_facial_dominance 0.624 0.533
c_z_f_res_facial_dominance 0.070 0.944
c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance -0.553 0.580
c_z_f_res_facial_dominance:ln_c_z_f_res_facial_dominance 0.236 0.814
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 110.19 on 80 degrees of freedom
Residual deviance: 103.27 on 76 degrees of freedom
AIC: 113.27
Number of Fisher Scoring iterations: 4
Neither of the interaction terms are significant, indicating that we should not reject the null hypothesis that there is a linear relationship between our continuous predictors and the logit.
Summary of the Models
First, here is the summary of the model with the main effects of mothers’ residual facial dominance and fathers’ residual facial dominance only, along with a Chi-square test for whether the model with the main effects fits significantly better than the model with only the intercept.
# Producing the summary of the main effects modelsummary(res_fac_dom_main_effects_model_neutral)
Call:
glm(formula = shared_child_sex ~ z_m_res_facial_dominance + z_f_res_facial_dominance,
family = binomial(link = logit), data = neutral_face_dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.3523 0.2355 1.496 0.1346
z_m_res_facial_dominance 0.1596 0.2367 0.674 0.5000
z_f_res_facial_dominance 0.6038 0.2668 2.263 0.0236 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 110.19 on 80 degrees of freedom
Residual deviance: 103.61 on 78 degrees of freedom
AIC: 109.61
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_res_fac_dom_main_effects_model_neutral <- res_fac_dom_main_effects_model_neutral$null.deviance - res_fac_dom_main_effects_model_neutral$devianceChi_res_fac_dom_main_effects_model_neutral
[1] 6.589181
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_res_fac_dom_main_effects_model_neutral <- res_fac_dom_main_effects_model_neutral$df.null - res_fac_dom_main_effects_model_neutral$df.residualdf_res_fac_dom_main_effects_model_neutral
[1] 2
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_res_fac_dom_main_effects_model_neutral <-1-pchisq(Chi_res_fac_dom_main_effects_model_neutral, df_res_fac_dom_main_effects_model_neutral)prob_Chi_res_fac_dom_main_effects_model_neutral
[1] 0.03708323
Despite the loss of power, the full main effects model is significant (x2(2) = 6.589, p = .037). The predictor for fathers’ residual facial dominance is also significant (b = .604, z = 2.263, p = .024). I will exponentiate the coefficients to make this more interpretable.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(res_fac_dom_main_effects_model_neutral$coefficients)
The odds ratio for father’s residual facial dominance is OR = 1.829, indicating that while holding mother’s residual facial dominance constant a one standard deviation increase in father’s residual facial dominance is associated with a 83% higher odds of having a first born son.
Now we will summarize the model with the main effects and their interaction, and we will compare it’s fit to the intercept only model. After this we will compare it to the main effects model.
# Summarizing the interaction modelsummary(res_fac_dom_interaction_model_neutral)
Call:
glm(formula = shared_child_sex ~ z_m_res_facial_dominance + z_f_res_facial_dominance +
z_m_res_facial_dominance:z_f_res_facial_dominance, family = binomial(link = logit),
data = neutral_face_dyadic_data)
Coefficients:
Estimate Std. Error z value
(Intercept) 0.3620 0.2395 1.511
z_m_res_facial_dominance 0.2463 0.2549 0.966
z_f_res_facial_dominance 0.7024 0.2879 2.440
z_m_res_facial_dominance:z_f_res_facial_dominance 0.3274 0.2669 1.227
Pr(>|z|)
(Intercept) 0.1307
z_m_res_facial_dominance 0.3338
z_f_res_facial_dominance 0.0147 *
z_m_res_facial_dominance:z_f_res_facial_dominance 0.2199
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 110.19 on 80 degrees of freedom
Residual deviance: 102.00 on 77 degrees of freedom
AIC: 110
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_res_fac_dom_interaction_model_neutral <- res_fac_dom_interaction_model_neutral$null.deviance - res_fac_dom_interaction_model_neutral$devianceChi_res_fac_dom_interaction_model_neutral
[1] 8.189911
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_res_fac_dom_interaction_model_neutral <- res_fac_dom_interaction_model_neutral$df.null - res_fac_dom_interaction_model_neutral$df.residualdf_res_fac_dom_interaction_model_neutral
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_res_fac_dom_interaction_model_neutral <-1-pchisq(Chi_res_fac_dom_interaction_model_neutral, df_res_fac_dom_interaction_model_neutral)prob_Chi_res_fac_dom_interaction_model_neutral
[1] 0.04224562
The full interaction model was significant (x2(3) = 8.19, p = .042), and although the interaction term is not significant, when the interaction term was added to the model the coefficient became larger for fathers’ residual facial dominance (b = .702, z = 2.440, p = .015). Again, I will exponentiate this coefficient to understand it better.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(res_fac_dom_interaction_model_neutral$coefficients)
The odds ratio for father’s residual facial dominance is now 2.02, indicating that while holding mother’s residual facial dominance constant a one standard deviation increase in father’s residual facial dominance is associated with a 102% higher odds of having a first born son.
Now to compare the main-effects model with the main-effects and interaction model.
# Calculating the chi-square statistic to compare the interaction model with the main effects modelChi_res_fac_dom_interaction_model_v_main_neutral <- res_fac_dom_main_effects_model_neutral$deviance - res_fac_dom_interaction_model_neutral$devianceChi_res_fac_dom_interaction_model_v_main_neutral
[1] 1.60073
# Calculating the degrees of freedom to compare the interaction model with the main effects modeldf_res_fac_dom_interaction_model_v_main_neutral <- res_fac_dom_main_effects_model_neutral$df.residual - res_fac_dom_interaction_model_neutral$df.residualdf_res_fac_dom_interaction_model_v_main_neutral
The Chi-squared test does not indicate that the interaction model fits better than the main effects model (x2(1) = 1.601, p = .206).
Although the interaction is not significant in our model, our hypothesis predicts that at high levels of fathers’ facial dominance mothers’ dominance predicts the probability of a first born son, so I will run a simple slopes and Johnson-Neyman analysis with mothers’ residual facial dominance as the focal predictor and fathers’ residual facial dominance as the moderator.
# Conducting the simple slopes and Johnson-Neyman analysissim_slopes(res_fac_dom_interaction_model_neutral, pred = z_m_res_facial_dominance, modx = z_f_res_facial_dominance, jnplot =TRUE)
JOHNSON-NEYMAN INTERVAL
The Johnson-Neyman interval could not be found. Is the p value for your
interaction term below the specified alpha?
SIMPLE SLOPES ANALYSIS
Slope of z_m_res_facial_dominance when z_f_res_facial_dominance = -1.000000e+00 (- 1 SD):
Est. S.E. z val. p
------- ------ -------- ------
-0.08 0.31 -0.26 0.79
Slope of z_m_res_facial_dominance when z_f_res_facial_dominance = -2.193033e-17 (Mean):
Est. S.E. z val. p
------ ------ -------- ------
0.25 0.25 0.97 0.33
Slope of z_m_res_facial_dominance when z_f_res_facial_dominance = 1.000000e+00 (+ 1 SD):
Est. S.E. z val. p
------ ------ -------- ------
0.57 0.42 1.37 0.17
Although nothing here is significant, I will exponentiate the coefficients here to make them more interpretable.
# Producing odds-ratios for each value of the slope for mothers' residual facial dominance for each level of the moderator in the simple slopes analysisexp(-.08) # -1 SD
[1] 0.9231163
exp(.24) # Mean
[1] 1.271249
exp(.56) # +1 SD
[1] 1.750673
# Taking the inverse of the odds-ratio for the first coefficient1-0.9231163
[1] 0.0768837
The Johnson-Neyman analysis indicates that there are no values of fathers’ residual facial dominance for which mothers’ residual facial dominance is a significant predictor of the probability of having a first born child. Similar to the analysis with all facial expression types, however, the simple slopes analysis indicates that at low levels of fathers’ residual facial dominance (-1 SD) a one standard deviation increase in mothers’ residual facial dominance is associated with a 7.7% decrease in the odds of having a first born son; at the mean for fathers’ residual facial dominance a one standard deviation increase in mothers’ residual facial dominance is associated with an 27.1% increase in the odds of having a first born son; and at high levels of fathers’ residual facial dominance (+1 SD) a one standard deviation increase in mothers’ residual facial dominance is associated with a 75.1% increase in the odds of having a first born son.
Although not significant, the direction of this moderation effect is consistent with our hypothesis, like in the model with all facial expressions included.
Again, it is possible that in the population there exists such an effect, whereas we simply do not have enough power to detect it.
Testing Hypothesis 3
To test hypothesis three, we will (1) look at whether fathers’ fWHR is significantly correlated with their residual facial dominance (and I will also include the other facial rating characteristics) either in the full dyadic dataset or in the neutral facial expression dyadic dataset, then, if so, (2) test whether fathers’ fWHR significantly differs by offspring sex, and (3) substitute fathers’ fWHR for residual facial dominance in the same binary logistic regression models as the previous section. In these latter two analyses, we should expect to see that as fathers’ fWHR increases the probability of having a first-born son also increases.
Correlating fWHR with Facial Dominance
Here are the bivariate correlations between fathers’ fWHR, ratings of attractiveness, ratings of masculinity/femininity, ratings of dominance, and residual facial dominance.
# Calculate pairwise correlations with p-values and confidence intervals for both datasetscorr4 <-corr.test(dyadic_data[, c("f_fWHR", "f_res_facial_dominance", "f_facial_masculinityfemininity", "f_facial_dominance", "f_facial_attractiveness")], use="pairwise.complete.obs")# Calculate pairwise correlations with p-values and confidence intervals for both datasetscorr5 <-corr.test(dyadic_data[, c("m_fWHR", "m_res_facial_dominance", "m_facial_masculinityfemininity", "m_facial_dominance", "m_facial_attractiveness")], use="pairwise.complete.obs")print(corr4, short=FALSE)
Substituting Fathers’ fHWR for Fathers’ Facial Dominance in Previous Models
To test Hypothesis 3 more rigorously, we will replace fathers’ facial dominance in the previous binary logistic regression models predicting shared child sex with fathers’ fWHR.
Full Dataset
First we will standardize our new independent variablee, f_fWHR.
# Standardizing the f_fWHR variable as z_f_fWHRdyadic_data$z_f_fWHR <-scale(dyadic_data$f_fWHR, center =TRUE, scale =TRUE)
Now we will fit our two models, one with the interaction term and one without.
# Fitting the model with only the main effects of residual facial dominance for mothers and fathers' fWHRhyp_3_main_all <-glm(shared_child_sex ~ z_m_res_facial_dominance + z_f_fWHR, family =binomial(link = logit), data = dyadic_data)# Fitting the model with the main effects of residual facial dominance for mothers and fathers' fWHR as well as their interactionhyp_3_interaction_all <-glm(shared_child_sex ~ z_m_res_facial_dominance + z_f_fWHR + z_m_res_facial_dominance:z_f_fWHR, family =binomial(link = logit), data = dyadic_data)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the main effects modelplot(hyp_3_main_all, which =4, id.n =10)
#Plotting Cook's distance for the interaction modelplot(hyp_3_interaction_all, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
No excessive multicollinearity: Will assess with VIF and tolerance statistics for the main effects model only because it cannot handle interactions.
# VIF and tolerance for the main effects modelvif(hyp_3_main_all)
The VIF and tolerance statistics are well within the reasonable range.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, but because the ln of a negative number is undefined we must first transform our predictor variables to all be positive by adding a constant. We will add 10 to make z_f_fWHR all positive before fitting the Box-Tidwell model.
# Creating c_z_f_fWHR, which represents standardized fathers' fWHR after adding a constant of 10dyadic_data$c_z_f_fWHR <- dyadic_data$z_f_fWHR +10# Creating ln_c_z_m_res_facial_dominance, which represents the natural log of standardized fathers' fWHR after adding a constant of 10dyadic_data$ln_c_z_f_fWHR <-log(dyadic_data$c_z_f_fWHR)
Now we need to fit and summarize a model with the main effects of each variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that neither of the interaction terms are significant, thereby satisfying the assumption of linearity of the logit.
BT_test_hyp_3_main_all <-glm(shared_child_sex ~ c_z_m_res_facial_dominance + c_z_f_fWHR + c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance + c_z_f_fWHR:ln_c_z_f_fWHR, family =binomial(link = logit), data = dyadic_data)summary(BT_test_hyp_3_main_all)
Call:
glm(formula = shared_child_sex ~ c_z_m_res_facial_dominance +
c_z_f_fWHR + c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance +
c_z_f_fWHR:ln_c_z_f_fWHR, family = binomial(link = logit),
data = dyadic_data)
Coefficients:
Estimate Std. Error
(Intercept) -25.770 36.592
c_z_m_res_facial_dominance 4.725 3.249
c_z_f_fWHR 5.368 11.919
c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance -1.909 1.350
c_z_f_fWHR:ln_c_z_f_fWHR -1.539 3.583
z value Pr(>|z|)
(Intercept) -0.704 0.481
c_z_m_res_facial_dominance 1.454 0.146
c_z_f_fWHR 0.450 0.652
c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance -1.414 0.157
c_z_f_fWHR:ln_c_z_f_fWHR -0.429 0.668
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 124.14 on 92 degrees of freedom
Residual deviance: 119.44 on 88 degrees of freedom
(11 observations deleted due to missingness)
AIC: 129.44
Number of Fisher Scoring iterations: 4
Neither of the interaction terms are significant, indicating that we should not reject the null hypothesis that there is a linear relationship between our continuous predictors and the logit.
Summary of the Models
First, here is the summary of the model with the main effects of mothers’ residual facial dominance and fathers’ fWHR only, along with a Chi-square test for whether the model with the main effects fits significantly better than the model with only the intercept.
# Producing the summary of the main effects modelsummary(hyp_3_main_all)
Call:
glm(formula = shared_child_sex ~ z_m_res_facial_dominance + z_f_fWHR,
family = binomial(link = logit), data = dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.4668 0.2160 2.162 0.0307 *
z_m_res_facial_dominance 0.1432 0.2118 0.676 0.4989
z_f_fWHR 0.3086 0.2272 1.358 0.1744
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 124.14 on 92 degrees of freedom
Residual deviance: 121.97 on 90 degrees of freedom
(11 observations deleted due to missingness)
AIC: 127.97
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_hyp_3_main_all <- hyp_3_main_all$null.deviance - hyp_3_main_all$devianceChi_hyp_3_main_all
[1] 2.175194
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_hyp_3_main_all <- hyp_3_main_all$df.null - hyp_3_main_all$df.residualdf_hyp_3_main_all
[1] 2
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_hyp_3_main_all <-1-pchisq(Chi_hyp_3_main_all, df_hyp_3_main_all)prob_Chi_hyp_3_main_all
[1] 0.3370255
The model is not a better fit than the baseline model with just the intercept (x2(2) = 2.175, p = .337), and the fathers’ fWHR predictor is also not significant (b = .309, z = 1.358, p = .174). Still, I will exponentiate the coefficients to make them more interpretable.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(hyp_3_main_all$coefficients)
The odds ratio for father’s residual facial dominance (although not significant) is OR = 1.362, indicating that while holding mother’s residual facial dominance constant a one standard deviation increase in father’s residual facial dominance is associated with a 36.2% higher odds of having a first born son.
Now we will summarize the model with the main effects and their interaction, and we will compare it’s fit to the intercept only model. After this we will compare it to the main effects model.
# Summarizing the interaction modelsummary(hyp_3_interaction_all)
Call:
glm(formula = shared_child_sex ~ z_m_res_facial_dominance + z_f_fWHR +
z_m_res_facial_dominance:z_f_fWHR, family = binomial(link = logit),
data = dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.48170 0.21971 2.192 0.0283 *
z_m_res_facial_dominance 0.17475 0.22623 0.772 0.4399
z_f_fWHR 0.31392 0.22902 1.371 0.1705
z_m_res_facial_dominance:z_f_fWHR 0.09505 0.22659 0.419 0.6749
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 124.14 on 92 degrees of freedom
Residual deviance: 121.79 on 89 degrees of freedom
(11 observations deleted due to missingness)
AIC: 129.79
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_hyp_3_interaction_all <- hyp_3_interaction_all$null.deviance - hyp_3_interaction_all$devianceChi_hyp_3_interaction_all
[1] 2.353127
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_hyp_3_interaction_all <- hyp_3_interaction_all$df.null - hyp_3_interaction_all$df.residualdf_hyp_3_interaction_all
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_hyp_3_interaction_all <-1-pchisq(Chi_hyp_3_interaction_all, df_hyp_3_interaction_all)prob_Chi_hyp_3_interaction_all
[1] 0.5024196
The model still does not fit significantly better than the intercept-only model (x2(3) = 2.353, p = .502) when the interaction between mothers’ residual facial dominance and fathers’ fWHR is added to the model. Similarly, fathers’ fWHR remains not significant (b = .314, z = 1.371, p = .171). I will exponentiate these coefficients to make them more interpretable.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(hyp_3_interaction_all$coefficients)
For father’s fWHR, the odds-ratio is virtually the same as in the main effects model
Because the whole interaction model is not significant compared to the intercept-only model, it would be very surprising if the interaction model would fit better than the main-effects only model, but I will go ahead and do the comparison anyways.
# Calculating the chi-square statistic to compare the interaction model with the main effects modelChi_hyp_3_interaction_v_main_all <- hyp_3_main_all$deviance - hyp_3_interaction_all$devianceChi_hyp_3_interaction_v_main_all
[1] 0.1779338
# Calculating the degrees of freedom to compare the interaction model with the main effects modeldf_hyp_3_interaction_v_main_all <- hyp_3_main_all$df.residual - hyp_3_interaction_all$df.residualdf_hyp_3_interaction_v_main_all
As suspected, the Chi-squared test does not indicate that the interaction model fits better than the main effects model (x2(1) = 0.178, p = .673).
Neutral Faces Only
Now we will repeat the same analysis as just above but with the neutral faces only.
Now we will standardize our new independent variables in the neutral faces data frame, f_fWHR.
# Standardizing the f_fWHR variable as z_f_fWHRneutral_face_dyadic_data$z_f_fWHR <-scale(neutral_face_dyadic_data$f_fWHR, center =TRUE, scale =TRUE)
Now we will fit our two models, one with the interaction term and one without.
# Fitting the model with only the main effects of residual facial dominance for mothers and fathers fWHRhyp_3_main_neutral <-glm(shared_child_sex ~ z_m_res_facial_dominance + z_f_fWHR, family =binomial(link = logit), data = neutral_face_dyadic_data)# Fitting the model with only the main effects of residual facial dominance for mothers and fathers fWHR plus their interactionhyp_3_interaction_neutral <-glm(shared_child_sex ~ z_m_res_facial_dominance + z_f_fWHR + z_m_res_facial_dominance:z_f_fWHR, family =binomial(link = logit), data = neutral_face_dyadic_data)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the main effects modelplot(hyp_3_main_neutral, which =4, id.n =10)
#Plotting Cook's distance for the interaction modelplot(hyp_3_interaction_neutral, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
No excessive multicollinearity: Will assess with VIF and tolerance statistics for the main effects model only because it cannot handle interactions.
# VIF and tolerance for the main effects modelvif(hyp_3_main_neutral)
The VIF and tolerance statistics are well within the reasonable range.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, but because the ln of a negative number is undefined we must first transform our predictor variables to all be positive by adding a constant. I will add 10 to the f_fWHR variable to make it positive, then I will make a natural log transformation to it.
# Creating c_z_f_fWHR, which represents standardized fathers' fWHR after adding a constant of 10neutral_face_dyadic_data$c_z_f_fWHR <- neutral_face_dyadic_data$z_f_fWHR +10# Creating ln_c_z_f_fWHR, which represents the natural log of standardized fathers' fWHR after adding a constant of 10neutral_face_dyadic_data$ln_c_z_f_fWHR <-log(neutral_face_dyadic_data$c_z_f_fWHR)
Now we need to fit and summarize a model with the main effects of each variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that neither of the interaction terms are significant, thereby satisfying the assumption of linearity of the logit.
BT_test_hyp_3_main_neutral <-glm(shared_child_sex ~ c_z_m_res_facial_dominance + c_z_f_fWHR + c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance + c_z_f_fWHR:ln_c_z_f_fWHR, family =binomial(link = logit), data = neutral_face_dyadic_data)summary(BT_test_hyp_3_main_neutral)
Call:
glm(formula = shared_child_sex ~ c_z_m_res_facial_dominance +
c_z_f_fWHR + c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance +
c_z_f_fWHR:ln_c_z_f_fWHR, family = binomial(link = logit),
data = neutral_face_dyadic_data)
Coefficients:
Estimate Std. Error
(Intercept) -20.054 37.250
c_z_m_res_facial_dominance 2.360 2.185
c_z_f_fWHR 4.890 12.072
c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance -1.016 1.034
c_z_f_fWHR:ln_c_z_f_fWHR -1.386 3.625
z value Pr(>|z|)
(Intercept) -0.538 0.590
c_z_m_res_facial_dominance 1.080 0.280
c_z_f_fWHR 0.405 0.685
c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance -0.982 0.326
c_z_f_fWHR:ln_c_z_f_fWHR -0.382 0.702
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 100.952 on 74 degrees of freedom
Residual deviance: 97.606 on 70 degrees of freedom
(6 observations deleted due to missingness)
AIC: 107.61
Number of Fisher Scoring iterations: 4
Neither of the interaction terms are significant, indicating that we should not reject the null hypothesis that there is a linear relationship between our continuous predictors and the logit.
Summary of the Model
First, here is the summary of the model with the main effects of mothers’ residual facial dominance and fathers’ residual facial dominance only, along with a Chi-square test for whether the model with the main effects fits significantly better than the model with only the intercept.
# Producing the summary of the main effects modelsummary(hyp_3_main_neutral)
Call:
glm(formula = shared_child_sex ~ z_m_res_facial_dominance + z_f_fWHR,
family = binomial(link = logit), data = neutral_face_dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.4103 0.2397 1.712 0.087 .
z_m_res_facial_dominance 0.2245 0.2394 0.938 0.348
z_f_fWHR 0.3056 0.2529 1.208 0.227
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 100.952 on 74 degrees of freedom
Residual deviance: 98.734 on 72 degrees of freedom
(6 observations deleted due to missingness)
AIC: 104.73
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_hyp_3_main_neutral <- hyp_3_main_neutral$null.deviance - hyp_3_main_neutral$devianceChi_hyp_3_main_neutral
[1] 2.217739
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_hyp_3_main_neutral <- hyp_3_main_neutral$df.null - hyp_3_main_neutral$df.residualdf_hyp_3_main_neutral
[1] 2
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_hyp_3_main_neutral <-1-pchisq(Chi_hyp_3_main_neutral, df_hyp_3_main_neutral)prob_Chi_hyp_3_main_neutral
[1] 0.3299318
The full main effects model is not significant (x2(2) = 2.218, p = .330). The predictor for fathers’ fWHR is also not significant (b = .305, z = 1.208, p = .227). I will exponentiate the coefficients to make them more interpretable.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(hyp_3_main_neutral$coefficients)
Now we will summarize the model with the main effects and their interaction, and we will compare it’s fit to the intercept only model. After this we will compare it to the main effects model.
# Summarizing the interaction modelsummary(hyp_3_interaction_neutral)
Call:
glm(formula = shared_child_sex ~ z_m_res_facial_dominance + z_f_fWHR +
z_m_res_facial_dominance:z_f_fWHR, family = binomial(link = logit),
data = neutral_face_dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.4891 0.2522 1.940 0.0524 .
z_m_res_facial_dominance 0.3745 0.2681 1.397 0.1625
z_f_fWHR 0.3551 0.2772 1.281 0.2001
z_m_res_facial_dominance:z_f_fWHR 0.4956 0.3086 1.606 0.1083
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 100.952 on 74 degrees of freedom
Residual deviance: 95.755 on 71 degrees of freedom
(6 observations deleted due to missingness)
AIC: 103.76
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_hyp_3_interaction_neutral <- hyp_3_interaction_neutral$null.deviance - hyp_3_interaction_neutral$devianceChi_hyp_3_interaction_neutral
[1] 5.196411
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_hyp_3_interaction_neutral <- hyp_3_interaction_neutral$df.null - hyp_3_interaction_neutral$df.residualdf_hyp_3_interaction_neutral
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_hyp_3_interaction_neutral <-1-pchisq(Chi_hyp_3_interaction_neutral, df_hyp_3_interaction_neutral)prob_Chi_hyp_3_interaction_neutral
[1] 0.1579672
The full interaction model was not significant (x2(3) = 5.196, p = .157), and neither is the predictor for fathers’ fWHR (b = .355, z = 1.281, p = .20). Again, I will exponentiate these coefficients.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(hyp_3_interaction_neutral$coefficients)
Now to compare the main-effects model with the main-effects and interaction model.
# Calculating the chi-square statistic to compare the interaction model with the main effects modelChi_res_fac_dom_interaction_model_v_main_neutral <- hyp_3_main_neutral$deviance - hyp_3_interaction_neutral$devianceChi_res_fac_dom_interaction_model_v_main_neutral
[1] 2.978672
# Calculating the degrees of freedom to compare the interaction model with the main effects modeldf_res_fac_dom_interaction_model_v_main_neutral <- hyp_3_main_neutral$df.residual - hyp_3_interaction_neutral$df.residualdf_res_fac_dom_interaction_model_v_main_neutral
The Chi-squared test, although close, indicates that the interaction model does not fit significantly better than the main effects model (x2(1) = 2.979, p = .084).
Predicting Offspring Sex from Mother and Father fWHR:
Full Dataset
First we will standardize our new independent variables, m_fWHR and f_fWHR.
# Standardizing the f_fWHR variable as z_f_fWHR and m_fWHR as z_m_fWHRdyadic_data$z_f_fWHR <-scale(dyadic_data$f_fWHR, center =TRUE, scale =TRUE)dyadic_data$z_m_fWHR <-scale(dyadic_data$m_fWHR, center =TRUE, scale =TRUE)
Now we will fit our two models, one with the interaction term and one without.
# Fitting the model with only the main effects of mothers' fWHR and fathers' fWHRhyp_3_pred_OSR_all <-glm(shared_child_sex ~ z_m_fWHR + z_f_fWHR, family =binomial(link = logit), data = dyadic_data)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the modelplot(hyp_3_pred_OSR_all, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
No excessive multicollinearity: Will assess with VIF and tolerance statistics for the main effects model only because it cannot handle interactions.
# VIF and tolerance for the main effects modelvif(hyp_3_pred_OSR_all)
z_m_fWHR z_f_fWHR
1.031947 1.031947
1/vif(hyp_3_pred_OSR_all)
z_m_fWHR z_f_fWHR
0.9690425 0.9690425
The VIF and tolerance statistics are well within the reasonable range.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, but because the ln of a negative number is undefined we must first transform our predictor variables to all be positive by adding a constant. We will add 10 to make z_f_fWHR all positive before fitting the Box-Tidwell model.
# Creating c_z_f_fWHR, which represents standardized fathers' fWHR after adding a constant of 10dyadic_data$c_z_f_fWHR <- dyadic_data$z_f_fWHR +10# Creating c_z_m_fWHR, which represents standardized mothers' fWHR after adding a constant of 10dyadic_data$c_z_m_fWHR <- dyadic_data$z_m_fWHR +10# Creating ln_c_z_f_fWHR, which represents the natural log of standardized fathers' fWHR after adding a constant of 10dyadic_data$ln_c_z_f_fWHR <-log(dyadic_data$c_z_f_fWHR)# Creating ln_c_z_m_fWHR, which represents the natural log of standardized fathers' fWHR after adding a constant of 10dyadic_data$ln_c_z_m_fWHR <-log(dyadic_data$c_z_m_fWHR)
Now we need to fit and summarize a model with the main effects of each variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that neither of the interaction terms are significant, thereby satisfying the assumption of linearity of the logit.
BT_test_hyp_3_pred_OSR_all <-glm(shared_child_sex ~ c_z_m_fWHR + c_z_f_fWHR + c_z_m_fWHR:ln_c_z_m_fWHR + c_z_f_fWHR:ln_c_z_f_fWHR, family =binomial(link = logit), data = dyadic_data)summary(BT_test_hyp_3_pred_OSR_all)
Call:
glm(formula = shared_child_sex ~ c_z_m_fWHR + c_z_f_fWHR + c_z_m_fWHR:ln_c_z_m_fWHR +
c_z_f_fWHR:ln_c_z_f_fWHR, family = binomial(link = logit),
data = dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -16.3126 50.7747 -0.321 0.748
c_z_m_fWHR 1.7528 11.6365 0.151 0.880
c_z_f_fWHR 3.6840 12.1441 0.303 0.762
c_z_m_fWHR:ln_c_z_m_fWHR -0.5963 3.4874 -0.171 0.864
c_z_f_fWHR:ln_c_z_f_fWHR -1.0348 3.6543 -0.283 0.777
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 109.5 on 80 degrees of freedom
Residual deviance: 107.4 on 76 degrees of freedom
(23 observations deleted due to missingness)
AIC: 117.4
Number of Fisher Scoring iterations: 4
Neither of the interaction terms are significant, indicating that we should not reject the null hypothesis that there is a linear relationship between our continuous predictors and the logit.
Summary of the Models
First, here is the summary of the model with the main effects of mothers’ residual facial dominance and fathers’ fWHR only, along with a Chi-square test for whether the model with the main effects fits significantly better than the model with only the intercept.
# Producing the summary of the main effects modelsummary(hyp_3_pred_OSR_all)
Call:
glm(formula = shared_child_sex ~ z_m_fWHR + z_f_fWHR, family = binomial(link = logit),
data = dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.4184 0.2320 1.804 0.0712 .
z_m_fWHR -0.2458 0.2370 -1.037 0.2996
z_f_fWHR 0.2502 0.2307 1.085 0.2780
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 109.50 on 80 degrees of freedom
Residual deviance: 107.52 on 78 degrees of freedom
(23 observations deleted due to missingness)
AIC: 113.52
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_hyp_3_pred_OSR_all <- hyp_3_pred_OSR_all$null.deviance - hyp_3_pred_OSR_all$devianceChi_hyp_3_pred_OSR_all
[1] 1.979839
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_hyp_3_pred_OSR_all <- hyp_3_pred_OSR_all$df.null - hyp_3_pred_OSR_all$df.residualdf_hyp_3_pred_OSR_all
[1] 2
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_hyp_3_pred_OSR_all <-1-pchisq(Chi_hyp_3_pred_OSR_all, df_hyp_3_pred_OSR_all)prob_Chi_hyp_3_pred_OSR_all
[1] 0.3716066
The model is not a better fit than the baseline model with just the intercept (x2(2) = 1.98, p = .372), and neither of the predictors are significant. Still, I will exponentiate the coefficients to make them more interpretable.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(hyp_3_pred_OSR_all$coefficients)
Now we will look at whether fathers’ fWHR predicts offspring sex without controlling for mothers’ facial dominance.
Fitting the model:
hyp_3_f_pred_OSR_all <-glm(shared_child_sex ~ z_f_fWHR, family =binomial(link = logit), data = dyadic_data)
Summarizing the model and testing it against the intercept-only model:
# Producing the summary of the modelsummary(hyp_3_f_pred_OSR_all)
Call:
glm(formula = shared_child_sex ~ z_f_fWHR, family = binomial(link = logit),
data = dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.4695 0.2154 2.179 0.0293 *
z_f_fWHR 0.2873 0.2238 1.284 0.1992
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 124.14 on 92 degrees of freedom
Residual deviance: 122.43 on 91 degrees of freedom
(11 observations deleted due to missingness)
AIC: 126.43
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_hyp_3_f_pred_OSR_all <- hyp_3_f_pred_OSR_all$null.deviance - hyp_3_f_pred_OSR_all$devianceChi_hyp_3_f_pred_OSR_all
[1] 1.713106
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_hyp_3_f_pred_OSR_all <- hyp_3_f_pred_OSR_all$df.null - hyp_3_f_pred_OSR_all$df.residualdf_hyp_3_f_pred_OSR_all
[1] 1
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_hyp_3_f_pred_OSR_all <-1-pchisq(Chi_hyp_3_f_pred_OSR_all, df_hyp_3_f_pred_OSR_all)prob_Chi_hyp_3_f_pred_OSR_all
[1] 0.1905829
Neither the model, nor the predictor, are significant at the .05 level.
Mothers’ fWHR Only
Now we will look at whether fathers’ fWHR predicts offspring sex without controlling for mothers’ facial dominance.
Fitting the model:
hyp_3_m_pred_OSR_all <-glm(shared_child_sex ~ z_m_fWHR, family =binomial(link = logit), data = dyadic_data)
Summarizing the model and testing it against the intercept-only model:
# Producing the summary of the modelsummary(hyp_3_m_pred_OSR_all)
Call:
glm(formula = shared_child_sex ~ z_m_fWHR, family = binomial(link = logit),
data = dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.31422 0.21363 1.471 0.141
z_m_fWHR -0.08784 0.21442 -0.410 0.682
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 122.58 on 89 degrees of freedom
Residual deviance: 122.41 on 88 degrees of freedom
(14 observations deleted due to missingness)
AIC: 126.41
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_hyp_3_m_pred_OSR_all <- hyp_3_m_pred_OSR_all$null.deviance - hyp_3_m_pred_OSR_all$devianceChi_hyp_3_m_pred_OSR_all
[1] 0.1679021
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_hyp_3_m_pred_OSR_all <- hyp_3_m_pred_OSR_all$df.null - hyp_3_m_pred_OSR_all$df.residualdf_hyp_3_m_pred_OSR_all
[1] 1
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_hyp_3_m_pred_OSR_all <-1-pchisq(Chi_hyp_3_m_pred_OSR_all, df_hyp_3_m_pred_OSR_all)prob_Chi_hyp_3_m_pred_OSR_all
[1] 0.681983
Neither the model, nor the predictor, are significant at the .05 level.
Neutral Faces Only
Now we will repeat the same analysis as just above but with the neutral faces only.
We will first standardize our new independent variables in the neutral faces data frame, f_fWHR.
# Standardizing the f_fWHR variable as z_f_fWHR and m_fWHR as z_m_fWHRneutral_face_dyadic_data$z_f_fWHR <-scale(neutral_face_dyadic_data$f_fWHR, center =TRUE, scale =TRUE)neutral_face_dyadic_data$z_m_fWHR <-scale(neutral_face_dyadic_data$m_fWHR, center =TRUE, scale =TRUE)
Now we will fit our two models, one with the interaction term and one without.
# Fitting the model with only the main effects of residual facial dominance for mothers and fathers fWHRhyp_3_pred_OSR_neutral <-glm(shared_child_sex ~ z_m_fWHR + z_f_fWHR, family =binomial(link = logit), data = neutral_face_dyadic_data)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the main effects modelplot(hyp_3_pred_OSR_neutral, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
No excessive multicollinearity: Will assess with VIF and tolerance statistics for the main effects model only because it cannot handle interactions.
# VIF and tolerance for the main effects modelvif(hyp_3_pred_OSR_neutral)
z_m_fWHR z_f_fWHR
1.081309 1.081309
1/vif(hyp_3_pred_OSR_neutral)
z_m_fWHR z_f_fWHR
0.9248049 0.9248049
The VIF and tolerance statistics are well within the reasonable range.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, but because the ln of a negative number is undefined we must first transform our predictor variables to all be positive by adding a constant. I will add 10 to the f_fWHR variable to make it positive, then I will make a natural log transformation to it.
# Creating c_z_f_fWHR, which represents standardized fathers' fWHR after adding a constant of 10neutral_face_dyadic_data$c_z_f_fWHR <- neutral_face_dyadic_data$z_f_fWHR +10# Creating c_z_m_fWHR, which represents standardized mothers' fWHR after adding a constant of 10neutral_face_dyadic_data$c_z_m_fWHR <- neutral_face_dyadic_data$z_m_fWHR +10# Creating ln_c_z_f_fWHR, which represents the natural log of standardized fathers' fWHR after adding a constant of 10neutral_face_dyadic_data$ln_c_z_f_fWHR <-log(neutral_face_dyadic_data$c_z_f_fWHR)# Creating ln_c_z_m_fWHR, which represents the natural log of standardized fathers' fWHR after adding a constant of 10neutral_face_dyadic_data$ln_c_z_m_fWHR <-log(neutral_face_dyadic_data$c_z_m_fWHR)
Now we need to fit and summarize a model with the main effects of each variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that neither of the interaction terms are significant, thereby satisfying the assumption of linearity of the logit.
BT_test_hyp_3_pred_OSR_neutral <-glm(shared_child_sex ~ c_z_m_fWHR + c_z_f_fWHR + c_z_m_fWHR:ln_c_z_m_fWHR + c_z_f_fWHR:ln_c_z_f_fWHR, family =binomial(link = logit), data = neutral_face_dyadic_data)summary(BT_test_hyp_3_pred_OSR_neutral)
Call:
glm(formula = shared_child_sex ~ c_z_m_fWHR + c_z_f_fWHR + c_z_m_fWHR:ln_c_z_m_fWHR +
c_z_f_fWHR:ln_c_z_f_fWHR, family = binomial(link = logit),
data = neutral_face_dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.9371 51.7111 0.115 0.909
c_z_m_fWHR -5.4289 12.8844 -0.421 0.673
c_z_f_fWHR 3.3566 12.3263 0.272 0.785
c_z_m_fWHR:ln_c_z_m_fWHR 1.5864 3.8894 0.408 0.683
c_z_f_fWHR:ln_c_z_f_fWHR -0.9313 3.7075 -0.251 0.802
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 92.792 on 67 degrees of freedom
Residual deviance: 91.309 on 63 degrees of freedom
(13 observations deleted due to missingness)
AIC: 101.31
Number of Fisher Scoring iterations: 4
Neither of the interaction terms are significant, indicating that we should not reject the null hypothesis that there is a linear relationship between our continuous predictors and the logit.
Summary of the Model
Here is the summary of the model with the main effects of mothers’ residual facial dominance and fathers’ residual facial dominance only, along with a Chi-square test for whether the model with the main effects fits significantly better than the model with only the intercept.
# Producing the summary of the main effects modelsummary(hyp_3_pred_OSR_neutral)
Call:
glm(formula = shared_child_sex ~ z_m_fWHR + z_f_fWHR, family = binomial(link = logit),
data = neutral_face_dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.3288 0.2499 1.315 0.188
z_m_fWHR -0.1840 0.2626 -0.701 0.483
z_f_fWHR 0.2708 0.2638 1.027 0.305
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 92.792 on 67 degrees of freedom
Residual deviance: 91.520 on 65 degrees of freedom
(13 observations deleted due to missingness)
AIC: 97.52
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_hyp_3_pred_OSR_neutral <- hyp_3_pred_OSR_neutral$null.deviance - hyp_3_pred_OSR_neutral$devianceChi_hyp_3_pred_OSR_neutral
[1] 1.272484
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_hyp_3_pred_OSR_neutral <- hyp_3_pred_OSR_neutral$df.null - hyp_3_pred_OSR_neutral$df.residualdf_hyp_3_pred_OSR_neutral
[1] 2
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_hyp_3_pred_OSR_neutral <-1-pchisq(Chi_hyp_3_pred_OSR_neutral, df_hyp_3_pred_OSR_neutral)prob_Chi_hyp_3_pred_OSR_neutral
[1] 0.5292776
The full model is not significant (x2(2) = 1.27, p = .529). Neither of the predictors are significant. I will still exponentiate the coefficients to check them.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(hyp_3_pred_OSR_neutral$coefficients)
Now we will look at whether fathers’ fWHR predicts offspring sex without controlling for mothers’ facial dominance.
Fitting the model:
hyp_3_f_pred_OSR_neutral <-glm(shared_child_sex ~ z_f_fWHR, family =binomial(link = logit), data = neutral_face_dyadic_data)
Summarizing the model and testing it against the intercept-only model:
# Producing the summary of the modelsummary(hyp_3_f_pred_OSR_neutral)
Call:
glm(formula = shared_child_sex ~ z_f_fWHR, family = binomial(link = logit),
data = neutral_face_dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.4141 0.2383 1.738 0.0823 .
z_f_fWHR 0.2803 0.2491 1.125 0.2605
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 100.952 on 74 degrees of freedom
Residual deviance: 99.631 on 73 degrees of freedom
(6 observations deleted due to missingness)
AIC: 103.63
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_hyp_3_f_pred_OSR_neutral <- hyp_3_f_pred_OSR_neutral$null.deviance - hyp_3_f_pred_OSR_neutral$devianceChi_hyp_3_f_pred_OSR_neutral
[1] 1.320907
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_hyp_3_f_pred_OSR_neutral <- hyp_3_f_pred_OSR_neutral$df.null - hyp_3_f_pred_OSR_neutral$df.residualdf_hyp_3_f_pred_OSR_neutral
[1] 1
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_hyp_3_f_pred_OSR_neutral <-1-pchisq(Chi_hyp_3_f_pred_OSR_neutral, df_hyp_3_f_pred_OSR_neutral)prob_Chi_hyp_3_f_pred_OSR_neutral
[1] 0.2504293
Neither the model, nor the predictor, are significant at the .05 level.
Mothers’ fWHR Only
Now we will look at whether fathers’ fWHR predicts offspring sex without controlling for mothers’ facial dominance.
Fitting the model:
hyp_3_m_pred_OSR_neutral <-glm(shared_child_sex ~ z_m_fWHR, family =binomial(link = logit), data = neutral_face_dyadic_data)
Summarizing the model and testing it against the intercept-only model:
# Producing the summary of the modelsummary(hyp_3_m_pred_OSR_neutral)
Call:
glm(formula = shared_child_sex ~ z_m_fWHR, family = binomial(link = logit),
data = neutral_face_dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.19240 0.23518 0.818 0.413
z_m_fWHR 0.02353 0.23689 0.099 0.921
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 100.53 on 72 degrees of freedom
Residual deviance: 100.52 on 71 degrees of freedom
(8 observations deleted due to missingness)
AIC: 104.52
Number of Fisher Scoring iterations: 3
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_hyp_3_m_pred_OSR_neutral <- hyp_3_m_pred_OSR_neutral$null.deviance - hyp_3_m_pred_OSR_neutral$devianceChi_hyp_3_m_pred_OSR_neutral
[1] 0.009870298
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_hyp_3_m_pred_OSR_neutral <- hyp_3_m_pred_OSR_neutral$df.null - hyp_3_m_pred_OSR_neutral$df.residualdf_hyp_3_m_pred_OSR_neutral
[1] 1
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_hyp_3_m_pred_OSR_neutral <-1-pchisq(Chi_hyp_3_m_pred_OSR_neutral, df_hyp_3_m_pred_OSR_neutral)prob_Chi_hyp_3_m_pred_OSR_neutral
[1] 0.9208609
Neither the model, nor the predictor, are significant at the .05 level.
Summary for Hypothesis 3
Although the direction of the effects here are as predicted, there are no significant relationships between fathers’ fWHR and sex of first born child in these models. It is worth noting that we have a loss of power due to some faces not having fWHR measurements. However, overall these analyses do not provide strong evidence for Hypothesis 3.
Addressing Critiques from EHB Editor
An editor from EHB brought to our attention a few critiques that we may be able to address with additional analyses:
By removing the effects of facial masculinity/femininity, attractiveness, and age from facial dominance ratings, interpretation becomes more complicated, so we should run the analyses with raw facial dominance as well.
We are limited in our conclusions because data collection occurred around 4 years after the birth of the first child on average. We could potentially address this by controlling for age.
In the following, I will run analyses that address these critiques.
Raw Facial Dominance Model
Hypothesis 2: Raw Facial Dominance
Now we will retest the hypothesis with facial dominance as such. Because there are group differences in facial dominance between the neutral and non-neutral facial expression groups, we will conduct this analysis on both the full dyadic dataset and on the neutral faces only.
Full Dataset
First we will standardize our independent variables, m_facial_dominance and f_facial_dominance.
# Standardizing the m_facial_dominance and f_facial_dominance variables as z_m_facial_dominance and z_f_facial_dominancedyadic_data$z_m_facial_dominance <-scale(dyadic_data$m_facial_dominance, center =TRUE, scale =TRUE)dyadic_data$z_f_facial_dominance <-scale(dyadic_data$f_facial_dominance, center =TRUE, scale =TRUE)
Now we will fit our two models, one with the interaction term and one without.
# Fitting the model with only the main effects of residual facial dominance for mothers and fathersfac_dom_main_effects_model <-glm(shared_child_sex ~ z_m_facial_dominance + z_f_facial_dominance, family =binomial(link = logit), data = dyadic_data)# Fitting the model with the main effects of residual facial dominance for mothers and fathers as well as their interactionfac_dom_interaction_model <-glm(shared_child_sex ~ z_m_facial_dominance + z_f_facial_dominance + z_m_facial_dominance:z_f_facial_dominance, family =binomial(link = logit), data = dyadic_data)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the main effects modelplot(fac_dom_main_effects_model, which =4, id.n =10)
#Plotting Cook's distance for the interaction modelplot(fac_dom_interaction_model, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
No excessive multicollinearity: Will assess with VIF and tolerance statistics for the main effects model only because it cannot handle interactions.
# VIF and tolerance for the main effects modelvif(fac_dom_main_effects_model)
The VIF and tolerance statistics are well within the reasonable range.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, but because the ln of a negative number is undefined we must first transform our predictor variables to all be positive by adding a constant. The below code determines the lowest value of z_m_facial_dominance and z_f_facial_dominance so that we can add a constant that makes all values for the variable positive before making the natural log transformation.
min(dyadic_data$z_m_facial_dominance)
[1] -2.221293
min(dyadic_data$z_f_facial_dominance, na.rm =TRUE) # Because there is one NA value in the father data we specify that the algorithm should ignore this value
[1] -2.250934
Given these minimum values, I will add 3 to each variable to make them positive, then I will make a natural log transformation to both.
# Creating c_z_m_facial_dominance, which represents mothers' facial dominance after adding a constant of 3dyadic_data$c_z_m_facial_dominance <- dyadic_data$z_m_facial_dominance +3# Creating ln_c_z_m_facial_dominance, which represents the natural log of mothers' facial dominance after adding a constant of 3dyadic_data$ln_c_z_m_facial_dominance <-log(dyadic_data$c_z_m_facial_dominance)# Creating c_z_f_facial_dominance, which represents fathers' facial dominance after adding a constant of 3dyadic_data$c_z_f_facial_dominance <- dyadic_data$z_f_facial_dominance +3# Creating ln_c_z_f_facial_dominance, which represents the natural log of fathers' facial dominance after adding a constant of 3dyadic_data$ln_c_z_f_facial_dominance <-log(dyadic_data$c_z_f_facial_dominance)
Through manually doing the transformations for a couple numbers, I believe that this transformation was done correctly.
Now we need to fit and summarize a model with the main effects of each variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that neither of the interaction terms are significant, thereby satisfying the assumption of linearity of the logit.
BT_test_facial_dom_model <-glm(shared_child_sex ~ c_z_m_facial_dominance + c_z_f_facial_dominance + c_z_m_facial_dominance:ln_c_z_m_facial_dominance + c_z_f_facial_dominance:ln_c_z_f_facial_dominance, family =binomial(link = logit), data = dyadic_data)summary(BT_test_facial_dom_model)
Call:
glm(formula = shared_child_sex ~ c_z_m_facial_dominance + c_z_f_facial_dominance +
c_z_m_facial_dominance:ln_c_z_m_facial_dominance + c_z_f_facial_dominance:ln_c_z_f_facial_dominance,
family = binomial(link = logit), data = dyadic_data)
Coefficients:
Estimate Std. Error z value
(Intercept) -5.65315 3.75784 -1.504
c_z_m_facial_dominance 4.30284 2.38592 1.803
c_z_f_facial_dominance 0.17066 2.48779 0.069
c_z_m_facial_dominance:ln_c_z_m_facial_dominance -2.06279 1.14918 -1.795
c_z_f_facial_dominance:ln_c_z_f_facial_dominance -0.05533 1.21132 -0.046
Pr(>|z|)
(Intercept) 0.1325
c_z_m_facial_dominance 0.0713 .
c_z_f_facial_dominance 0.9453
c_z_m_facial_dominance:ln_c_z_m_facial_dominance 0.0727 .
c_z_f_facial_dominance:ln_c_z_f_facial_dominance 0.9636
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 138.48 on 102 degrees of freedom
Residual deviance: 134.17 on 98 degrees of freedom
(1 observation deleted due to missingness)
AIC: 144.17
Number of Fisher Scoring iterations: 4
Neither of the interaction terms are significant, indicating that we should not reject the null hypothesis that there is a linear relationship between our continuous predictors and the logit.
Summary of the Models
First, here is the summary of the model with the main effects of mothers’ residual facial dominance and fathers’ residual facial dominance only, along with a Chi-square test for whether the model with the main effects fits significantly better than the model with only the intercept.
# Producing the summary of the main effects modelsummary(fac_dom_main_effects_model)
Call:
glm(formula = shared_child_sex ~ z_m_facial_dominance + z_f_facial_dominance,
family = binomial(link = logit), data = dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.41452 0.20153 2.057 0.0397 *
z_m_facial_dominance 0.03685 0.20376 0.181 0.8565
z_f_facial_dominance 0.07664 0.20461 0.375 0.7080
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 138.48 on 102 degrees of freedom
Residual deviance: 138.28 on 100 degrees of freedom
(1 observation deleted due to missingness)
AIC: 144.28
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_fac_dom_main_effects_model <- fac_dom_main_effects_model$null.deviance - fac_dom_main_effects_model$devianceChi_fac_dom_main_effects_model
[1] 0.1965799
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_fac_dom_main_effects_model <- fac_dom_main_effects_model$df.null - fac_dom_main_effects_model$df.residualdf_fac_dom_main_effects_model
[1] 2
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_fac_dom_main_effects_model <-1-pchisq(Chi_fac_dom_main_effects_model, df_fac_dom_main_effects_model)prob_Chi_fac_dom_main_effects_model
[1] 0.906386
Although the whole model is not a better fit than the baseline model with just the intercept (x2(2) = .197, p = .906). In addition, neither of the predictors are close to significant. I will exponentiate these coefficients to make them more interpretable.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(fac_dom_main_effects_model$coefficients)
Now we will summarize the model with the main effects and their interaction, and we will compare it’s fit to the intercept only model. After this we will compare it to the main effects model.
# Summarizing the interaction modelsummary(fac_dom_interaction_model)
Call:
glm(formula = shared_child_sex ~ z_m_facial_dominance + z_f_facial_dominance +
z_m_facial_dominance:z_f_facial_dominance, family = binomial(link = logit),
data = dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.39030 0.20363 1.917 0.0553
z_m_facial_dominance 0.05891 0.20767 0.284 0.7767
z_f_facial_dominance 0.07431 0.20792 0.357 0.7208
z_m_facial_dominance:z_f_facial_dominance 0.22360 0.18020 1.241 0.2147
(Intercept) .
z_m_facial_dominance
z_f_facial_dominance
z_m_facial_dominance:z_f_facial_dominance
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 138.48 on 102 degrees of freedom
Residual deviance: 136.66 on 99 degrees of freedom
(1 observation deleted due to missingness)
AIC: 144.66
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_fac_dom_interaction_model <- fac_dom_interaction_model$null.deviance - fac_dom_interaction_model$devianceChi_fac_dom_interaction_model
[1] 1.812384
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_fac_dom_interaction_model <- fac_dom_interaction_model$df.null - fac_dom_interaction_model$df.residualdf_fac_dom_interaction_model
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_fac_dom_interaction_model <-1-pchisq(Chi_fac_dom_interaction_model, df_fac_dom_interaction_model)prob_Chi_fac_dom_interaction_model
[1] 0.6122438
The model is still not significant (x2(3) = 1.812, p = .612). None of the predictors, including the interaction, are significant. I will exponentiate these coefficients to make them more interpretable.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(fac_dom_interaction_model$coefficients)
Because the whole interaction model is not significant compared to the intercept-only model, it would be very surprising if the interaction model would fit better than the main-effects only model, but I will go ahead and do the comparison anyways.
# Calculating the chi-square statistic to compare the interaction model with the main effects modelChi_fac_dom_interaction_model_v_main <- fac_dom_main_effects_model$deviance - fac_dom_interaction_model$devianceChi_fac_dom_interaction_model_v_main
[1] 1.615804
# Calculating the degrees of freedom to compare the interaction model with the main effects modeldf_fac_dom_interaction_model_v_main <- fac_dom_main_effects_model$df.residual - fac_dom_interaction_model$df.residualdf_fac_dom_interaction_model_v_main
As suspected, the Chi-squared test does not indicate that the interaction model fits better than the main effects model (x2(1) = 1.616, p = .204).
Although the interaction is not significant in our model, our hypothesis predicts that at high levels of fathers’ facial dominance mothers’ dominance predicts the probability of a first born son, so I will run a simple slopes and Johnson-Neyman analysis with mothers’ residual facial dominance as the focal predictor and fathers’ residual facial dominance as the moderator.
# Conducting the simple slopes and Johnson-Neyman analysissim_slopes(fac_dom_interaction_model, pred = z_m_facial_dominance, modx = z_f_facial_dominance, jnplot =TRUE)
JOHNSON-NEYMAN INTERVAL
The Johnson-Neyman interval could not be found. Is the p value for your
interaction term below the specified alpha?
SIMPLE SLOPES ANALYSIS
Slope of z_m_facial_dominance when z_f_facial_dominance = -1.000000e+00 (- 1 SD):
Est. S.E. z val. p
------- ------ -------- ------
-0.16 0.26 -0.63 0.53
Slope of z_m_facial_dominance when z_f_facial_dominance = 1.567247e-15 (Mean):
Est. S.E. z val. p
------ ------ -------- ------
0.06 0.21 0.28 0.78
Slope of z_m_facial_dominance when z_f_facial_dominance = 1.000000e+00 (+ 1 SD):
Est. S.E. z val. p
------ ------ -------- ------
0.28 0.29 0.99 0.32
Although nothing here is significant, I will exponentiate the coefficients here to make them more interpretable.
# Producing odds-ratios for each value of the slope for mothers' residual facial dominance for each level of the moderator in the simple slopes analysisexp(-.16) # -1 SD
[1] 0.8521438
exp(.06) # Mean
[1] 1.061837
exp(.28) # +1 SD
[1] 1.32313
# Taking the inverse of the odds-ratio for the first coefficient1-0.8521438
[1] 0.1478562
The Johnson-Neyman analysis indicates that there are no values of fathers’ facial dominance for which mothers’ facial dominance is a significant predictor of the probability of having a first born son. Worthy of note, however, the simple slopes analysis indicates that at low levels of fathers’ residual facial dominance (-1 SD) a one standard deviation increase in mothers’ residual facial dominance is associated with a 14.8% decrease in the odds of having a first born son and at high levels of fathers’ residual facial dominance (+1 SD) a one standard deviation increase in mothers’ residual facial dominance is associated with a 32.3% increase in the odds of having a first born son.
Although not significant, the direction of this moderation effect is consistent with our hypothesis.
It is possible that in the population there exists such an effect, whereas we simply do not have enough power to detect it.
Neutral Faces Only
Now we will repeat the same analysis as just above but with the neutral faces only. I will first standardize our independent variables in the new data frame, m_facial_dominance and f_facial_dominance.
# Standardizing the m_facial_dominance and f_facial_dominance variables as z_m_facial_dominance and z_f_facial_dominance (because the mean and SD may be slightly different in this new data frame)neutral_face_dyadic_data$z_m_facial_dominance <-scale(neutral_face_dyadic_data$m_facial_dominance, center =TRUE, scale =TRUE)neutral_face_dyadic_data$z_f_facial_dominance <-scale(neutral_face_dyadic_data$f_facial_dominance, center =TRUE, scale =TRUE)
Now we will fit our two models, one with the interaction term and one without.
# Fitting the model with only the main effects of residual facial dominance for mothers and fathersfac_dom_main_effects_model_neutral <-glm(shared_child_sex ~ z_m_facial_dominance + z_f_facial_dominance, family =binomial(link = logit), data = neutral_face_dyadic_data)# Fitting the model with the main effects of residual facial dominance for mothers and fathers as well as their interactionfac_dom_interaction_model_neutral <-glm(shared_child_sex ~ z_m_facial_dominance + z_f_facial_dominance + z_m_facial_dominance:z_f_facial_dominance, family =binomial(link = logit), data = neutral_face_dyadic_data)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the main effects modelplot(fac_dom_main_effects_model_neutral, which =4, id.n =10)
#Plotting Cook's distance for the interaction modelplot(fac_dom_interaction_model_neutral, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
No excessive multicollinearity: Will assess with VIF and tolerance statistics for the main effects model only because it cannot handle interactions.
# VIF and tolerance for the main effects modelvif(fac_dom_main_effects_model_neutral)
The VIF and tolerance statistics are well within the reasonable range.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, but because the ln of a negative number is undefined we must first transform our predictor variables to all be positive by adding a constant. The below code determines the lowest value of z_m_facial_dominance and z_f_facial_dominance so that we can add a constant that makes all values for the variable positive before making the natural log transformation.
Given these minimum values, I will add 3 to each variable to make them positive, then I will make a natural log transformation to both.
# Creating c_z_m_facial_dominance, which represents mothers' facial dominance after adding a constant of 3neutral_face_dyadic_data$c_z_m_facial_dominance <- neutral_face_dyadic_data$z_m_facial_dominance +3# Creating ln_c_z_m_facial_dominance, which represents the natural log of mothers' facial dominance after adding a constant of 3neutral_face_dyadic_data$ln_c_z_m_facial_dominance <-log(neutral_face_dyadic_data$c_z_m_facial_dominance)# Creating c_z_f_facial_dominance, which represents fathers' facial dominance after adding a constant of 3neutral_face_dyadic_data$c_z_f_facial_dominance <- neutral_face_dyadic_data$z_f_facial_dominance +3# Creating ln_c_z_f_facial_dominance, which represents the natural log of fathers' facial dominance after adding a constant of 3neutral_face_dyadic_data$ln_c_z_f_facial_dominance <-log(neutral_face_dyadic_data$c_z_f_facial_dominance)
Now we need to fit and summarize a model with the main effects of each variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that neither of the interaction terms are significant, thereby satisfying the assumption of linearity of the logit.
BT_test_facial_dom_model_neutral <-glm(shared_child_sex ~ c_z_m_facial_dominance + c_z_f_facial_dominance + c_z_m_facial_dominance:ln_c_z_m_facial_dominance + c_z_f_facial_dominance:ln_c_z_f_facial_dominance, family =binomial(link = logit), data = neutral_face_dyadic_data)summary(BT_test_facial_dom_model_neutral)
Call:
glm(formula = shared_child_sex ~ c_z_m_facial_dominance + c_z_f_facial_dominance +
c_z_m_facial_dominance:ln_c_z_m_facial_dominance + c_z_f_facial_dominance:ln_c_z_f_facial_dominance,
family = binomial(link = logit), data = neutral_face_dyadic_data)
Coefficients:
Estimate Std. Error z value
(Intercept) -4.33791 3.73137 -1.163
c_z_m_facial_dominance 3.15706 2.53723 1.244
c_z_f_facial_dominance -0.03659 2.42403 -0.015
c_z_m_facial_dominance:ln_c_z_m_facial_dominance -1.43669 1.23970 -1.159
c_z_f_facial_dominance:ln_c_z_f_facial_dominance 0.08294 1.20964 0.069
Pr(>|z|)
(Intercept) 0.245
c_z_m_facial_dominance 0.213
c_z_f_facial_dominance 0.988
c_z_m_facial_dominance:ln_c_z_m_facial_dominance 0.246
c_z_f_facial_dominance:ln_c_z_f_facial_dominance 0.945
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 110.19 on 80 degrees of freedom
Residual deviance: 107.11 on 76 degrees of freedom
AIC: 117.11
Number of Fisher Scoring iterations: 4
Neither of the interaction terms are significant, indicating that we should not reject the null hypothesis that there is a linear relationship between our continuous predictors and the logit.
Summary of the Models
First, here is the summary of the model with the main effects of mothers’ residual facial dominance and fathers’ residual facial dominance only, along with a Chi-square test for whether the model with the main effects fits significantly better than the model with only the intercept.
# Producing the summary of the main effects modelsummary(fac_dom_main_effects_model_neutral)
Call:
glm(formula = shared_child_sex ~ z_m_facial_dominance + z_f_facial_dominance,
family = binomial(link = logit), data = neutral_face_dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.3297 0.2274 1.450 0.147
z_m_facial_dominance 0.2309 0.2306 1.002 0.317
z_f_facial_dominance 0.1446 0.2300 0.629 0.529
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 110.19 on 80 degrees of freedom
Residual deviance: 108.70 on 78 degrees of freedom
AIC: 114.7
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_fac_dom_main_effects_model_neutral <- fac_dom_main_effects_model_neutral$null.deviance - fac_dom_main_effects_model_neutral$devianceChi_fac_dom_main_effects_model_neutral
[1] 1.493213
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_fac_dom_main_effects_model_neutral <- fac_dom_main_effects_model_neutral$df.null - fac_dom_main_effects_model_neutral$df.residualdf_fac_dom_main_effects_model_neutral
[1] 2
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_fac_dom_main_effects_model_neutral <-1-pchisq(Chi_fac_dom_main_effects_model_neutral, df_fac_dom_main_effects_model_neutral)prob_Chi_fac_dom_main_effects_model_neutral
[1] 0.4739722
The full main effects model is not significant compared to the intercept-only model (x2(2) = 1.493, p = .474). Neither of the predictors are significant. I will exponentiate the coefficients to make this more interpretable.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(fac_dom_main_effects_model_neutral$coefficients)
Now we will summarize the model with the main effects and their interaction, and we will compare it’s fit to the intercept only model. After this we will compare it to the main effects model.
# Summarizing the interaction modelsummary(fac_dom_interaction_model_neutral)
Call:
glm(formula = shared_child_sex ~ z_m_facial_dominance + z_f_facial_dominance +
z_m_facial_dominance:z_f_facial_dominance, family = binomial(link = logit),
data = neutral_face_dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.3264 0.2281 1.431 0.152
z_m_facial_dominance 0.2473 0.2331 1.061 0.289
z_f_facial_dominance 0.1409 0.2299 0.613 0.540
z_m_facial_dominance:z_f_facial_dominance 0.1123 0.1905 0.589 0.556
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 110.19 on 80 degrees of freedom
Residual deviance: 108.35 on 77 degrees of freedom
AIC: 116.35
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_fac_dom_interaction_model_neutral <- fac_dom_interaction_model_neutral$null.deviance - fac_dom_interaction_model_neutral$devianceChi_fac_dom_interaction_model_neutral
[1] 1.841549
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_fac_dom_interaction_model_neutral <- fac_dom_interaction_model_neutral$df.null - fac_dom_interaction_model_neutral$df.residualdf_fac_dom_interaction_model_neutral
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_fac_dom_interaction_model_neutral <-1-pchisq(Chi_fac_dom_interaction_model_neutral, df_fac_dom_interaction_model_neutral)prob_Chi_fac_dom_interaction_model_neutral
[1] 0.6059354
The full interaction model is not significant compared to the intercept only (x2(3) = 1.842, p = .606). None of the predictors are significant. Again, I will exponentiate these coefficients to understand it better.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(fac_dom_interaction_model_neutral$coefficients)
Now to compare the main-effects model with the main-effects and interaction model.
# Calculating the chi-square statistic to compare the interaction model with the main effects modelChi_fac_dom_interaction_model_v_main_neutral <- fac_dom_main_effects_model_neutral$deviance - fac_dom_interaction_model_neutral$devianceChi_fac_dom_interaction_model_v_main_neutral
[1] 0.3483355
# Calculating the degrees of freedom to compare the interaction model with the main effects modeldf_fac_dom_interaction_model_v_main_neutral <- fac_dom_main_effects_model_neutral$df.residual - fac_dom_interaction_model_neutral$df.residualdf_fac_dom_interaction_model_v_main_neutral
The Chi-squared test does not indicate that the interaction model does not fit better than the main effects model (x2(1) = .348, p = .555).
Although the interaction is not significant in our model, our hypothesis predicts that at high levels of fathers’ facial dominance mothers’ dominance predicts the probability of a first born son, so I will run a simple slopes and Johnson-Neyman analysis with mothers’ residual facial dominance as the focal predictor and fathers’ residual facial dominance as the moderator.
# Conducting the simple slopes and Johnson-Neyman analysissim_slopes(fac_dom_interaction_model_neutral, pred = z_m_facial_dominance, modx = z_f_facial_dominance, jnplot =TRUE)
JOHNSON-NEYMAN INTERVAL
The Johnson-Neyman interval could not be found. Is the p value for your
interaction term below the specified alpha?
SIMPLE SLOPES ANALYSIS
Slope of z_m_facial_dominance when z_f_facial_dominance = -1.000000e+00 (- 1 SD):
Est. S.E. z val. p
------ ------ -------- ------
0.14 0.28 0.48 0.63
Slope of z_m_facial_dominance when z_f_facial_dominance = 3.974873e-16 (Mean):
Est. S.E. z val. p
------ ------ -------- ------
0.25 0.23 1.06 0.29
Slope of z_m_facial_dominance when z_f_facial_dominance = 1.000000e+00 (+ 1 SD):
Est. S.E. z val. p
------ ------ -------- ------
0.36 0.32 1.12 0.26
Although nothing here is significant, I will exponentiate the coefficients here to make them more interpretable.
# Producing odds-ratios for each value of the slope for mothers' residual facial dominance for each level of the moderator in the simple slopes analysisexp(-.14) # -1 SD
[1] 0.8693582
exp(.25) # Mean
[1] 1.284025
exp(.36) # +1 SD
[1] 1.433329
# Taking the inverse of the odds-ratio for the first coefficient1-0.8693582
[1] 0.1306418
The Johnson-Neyman analysis indicates that there are no values of fathers’ facial dominance for which mothers’ facial dominance is a significant predictor of the probability of having a first born son. Similar to the analysis with all facial expression types, however, the simple slopes analysis indicates that at low levels of fathers’ residual facial dominance (-1 SD) a one standard deviation increase in mothers’ residual facial dominance is associated with a 13.1% decrease in the odds of having a first born son; at the mean for fathers’ residual facial dominance a one standard deviation increase in mothers’ residual facial dominance is associated with an 28.4% increase in the odds of having a first born son; and at high levels of fathers’ residual facial dominance (+1 SD) a one standard deviation increase in mothers’ residual facial dominance is associated with a 43.3% increase in the odds of having a first born son.
Although not significant, the direction of this moderation effect is consistent with our hypothesis, like in the model with all facial expressions included.
Again, it is possible that in the population there exists such an effect, whereas we simply do not have enough power to detect it.
Controlling for Age of First-Born
Hypothesis 2: Facial Dominance Operationalization
Now we will test the hypothesis with the standardized residuals for facial dominance that we created within sexes, while controlling shared_child_age. Because there are group differences in facial dominance between the neutral and non-neutral facial expression groups, we will conduct this analysis on both the full dyadic dataset and on the neutral faces only.
Full Dataset
Now we will fit our two models, one with the interaction term and one without (both with shared_child_age as a covariate).
# Fitting the model with only the main effects of residual facial dominance for mothers and fathersres_fac_dom_main_effects_model_a <-glm(shared_child_sex ~ z_m_res_facial_dominance + z_f_res_facial_dominance + shared_child_age, family =binomial(link = logit), data = dyadic_data)# Fitting the model with the main effects of residual facial dominance for mothers and fathers as well as their interactionres_fac_dom_interaction_model_a <-glm(shared_child_sex ~ z_m_res_facial_dominance + z_f_res_facial_dominance + shared_child_age + z_m_res_facial_dominance:z_f_res_facial_dominance, family =binomial(link = logit), data = dyadic_data)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the main effects modelplot(res_fac_dom_main_effects_model_a, which =4, id.n =10)
#Plotting Cook's distance for the interaction modelplot(res_fac_dom_interaction_model_a, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
No excessive multicollinearity: Will assess with VIF and tolerance statistics for the main effects model only because it cannot handle interactions.
# VIF and tolerance for the main effects modelvif(res_fac_dom_main_effects_model_a)
The VIF and tolerance statistics are well within the reasonable range.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, but because the ln of a negative number is undefined we must first transform our predictor variables to all be positive by adding a constant. The below code determines the lowest value of z_m_res_facial_dominance and z_f_res_facial_dominance so that we can add a constant that makes all values for the variable positive before making the natural log transformation.
min(dyadic_data$z_m_res_facial_dominance)
[1] -2.439434
min(dyadic_data$z_f_res_facial_dominance, na.rm =TRUE) # Because there is one NA value in the father data we specify that the algorithm should ignore this value
[1] -3.361094
Given these minimum values, I will add 4 to each variable to make them positive, then I will make a natural log transformation to both.
# Creating c_z_m_res_facial_dominance, which represents mothers' residual facial dominance after adding a constant of 4dyadic_data$c_z_m_res_facial_dominance <- dyadic_data$z_m_res_facial_dominance +4# Creating ln_c_z_m_res_facial_dominance, which represents the natural log of mothers' residual facial dominance after adding a constant of 4dyadic_data$ln_c_z_m_res_facial_dominance <-log(dyadic_data$c_z_m_res_facial_dominance)# Creating c_z_f_res_facial_dominance, which represents fathers' residual facial dominance after adding a constant of 4dyadic_data$c_z_f_res_facial_dominance <- dyadic_data$z_f_res_facial_dominance +4# Creating ln_c_z_f_res_facial_dominance, which represents the natural log of fathers' residual facial dominance after adding a constant of 4dyadic_data$ln_c_z_f_res_facial_dominance <-log(dyadic_data$c_z_f_res_facial_dominance)
Through manually doing the transformations for a couple numbers, I believe that this transformation was done correctly.
Now we need to fit and summarize a model with the main effects of each variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that neither of the interaction terms are significant, thereby satisfying the assumption of linearity of the logit.
BT_test_res_facial_dom_model <-glm(shared_child_sex ~ c_z_m_res_facial_dominance + c_z_f_res_facial_dominance + c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance + c_z_f_res_facial_dominance:ln_c_z_f_res_facial_dominance, family =binomial(link = logit), data = dyadic_data)summary(BT_test_res_facial_dom_model)
Call:
glm(formula = shared_child_sex ~ c_z_m_res_facial_dominance +
c_z_f_res_facial_dominance + c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance +
c_z_f_res_facial_dominance:ln_c_z_f_res_facial_dominance,
family = binomial(link = logit), data = dyadic_data)
Coefficients:
Estimate Std. Error
(Intercept) -0.2428 6.4165
c_z_m_res_facial_dominance 2.9280 3.0059
c_z_f_res_facial_dominance -3.3793 2.5450
c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance -1.2074 1.2521
c_z_f_res_facial_dominance:ln_c_z_f_res_facial_dominance 1.6497 1.1157
z value Pr(>|z|)
(Intercept) -0.038 0.970
c_z_m_res_facial_dominance 0.974 0.330
c_z_f_res_facial_dominance -1.328 0.184
c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance -0.964 0.335
c_z_f_res_facial_dominance:ln_c_z_f_res_facial_dominance 1.479 0.139
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 138.48 on 102 degrees of freedom
Residual deviance: 130.80 on 98 degrees of freedom
(1 observation deleted due to missingness)
AIC: 140.8
Number of Fisher Scoring iterations: 4
Neither of the interaction terms are significant, indicating that we should not reject the null hypothesis that there is a linear relationship between our continuous predictors and the logit.
Summary of the Models
First, here is the summary of the model with the main effects of mothers’ residual facial dominance and fathers’ residual facial dominance only, along with a Chi-square test for whether the model with the main effects fits significantly better than the model with only the intercept.
# Producing the summary of the main effects modelsummary(res_fac_dom_main_effects_model_a)
Call:
glm(formula = shared_child_sex ~ z_m_res_facial_dominance + z_f_res_facial_dominance +
shared_child_age, family = binomial(link = logit), data = dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.18791 0.43067 0.436 0.663
z_m_res_facial_dominance 0.01830 0.20554 0.089 0.929
z_f_res_facial_dominance 0.37273 0.21639 1.723 0.085 .
shared_child_age 0.06000 0.09576 0.627 0.531
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 138.48 on 102 degrees of freedom
Residual deviance: 134.75 on 99 degrees of freedom
(1 observation deleted due to missingness)
AIC: 142.75
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_res_fac_dom_main_effects_model_a <- res_fac_dom_main_effects_model_a$null.deviance - res_fac_dom_main_effects_model_a$devianceChi_res_fac_dom_main_effects_model_a
[1] 3.725526
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_res_fac_dom_main_effects_model_a <- res_fac_dom_main_effects_model_a$df.null - res_fac_dom_main_effects_model_a$df.residualdf_res_fac_dom_main_effects_model_a
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_res_fac_dom_main_effects_model_a <-1-pchisq(Chi_res_fac_dom_main_effects_model_a, df_res_fac_dom_main_effects_model_a)prob_Chi_res_fac_dom_main_effects_model_a
[1] 0.2926683
Although the whole model is not a better fit than the baseline model with just the intercept (x2(3) = 3.726, p = .293), the residual facial dominance predictor for fathers was close to significant (b = .373, z = 1.723, p = .085), with a one standard deviation increase in father’s residual facial dominance leading to a .373 unit increase in the log-odds of having a first born son. I will exponentiate this coefficient to make it more interpretable.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(res_fac_dom_main_effects_model_a$coefficients)
The odds ratio for father’s residual facial dominance (although not significant) is OR = 1.452, indicating that while holding mother’s residual facial dominance constant a one standard deviation increase in father’s residual facial dominance is associated with a 45.2% higher odds of having a first born son.
Now we will summarize the model with the main effects and their interaction, and we will compare it’s fit to the intercept only model. After this we will compare it to the main effects model.
# Summarizing the interaction modelsummary(res_fac_dom_interaction_model_a)
Call:
glm(formula = shared_child_sex ~ z_m_res_facial_dominance + z_f_res_facial_dominance +
shared_child_age + z_m_res_facial_dominance:z_f_res_facial_dominance,
family = binomial(link = logit), data = dyadic_data)
Coefficients:
Estimate Std. Error z value
(Intercept) 0.10565 0.43861 0.241
z_m_res_facial_dominance 0.09536 0.21997 0.434
z_f_res_facial_dominance 0.45774 0.23094 1.982
shared_child_age 0.07295 0.09703 0.752
z_m_res_facial_dominance:z_f_res_facial_dominance 0.42393 0.26611 1.593
Pr(>|z|)
(Intercept) 0.8097
z_m_res_facial_dominance 0.6646
z_f_res_facial_dominance 0.0475 *
shared_child_age 0.4521
z_m_res_facial_dominance:z_f_res_facial_dominance 0.1111
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 138.48 on 102 degrees of freedom
Residual deviance: 131.96 on 98 degrees of freedom
(1 observation deleted due to missingness)
AIC: 141.96
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_res_fac_dom_interaction_model_a <- res_fac_dom_interaction_model_a$null.deviance - res_fac_dom_interaction_model_a$devianceChi_res_fac_dom_interaction_model_a
[1] 6.514159
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_res_fac_dom_interaction_model_a <- res_fac_dom_interaction_model_a$df.null - res_fac_dom_interaction_model_a$df.residualdf_res_fac_dom_interaction_model_a
[1] 4
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_res_fac_dom_interaction_model_a <-1-pchisq(Chi_res_fac_dom_interaction_model_a, df_res_fac_dom_interaction_model_a)prob_Chi_res_fac_dom_interaction_model_a
[1] 0.1639004
Interestingly—although the model still does not fit significantly better than the intercept-only model (x2(4) = 6.514, p = .164)—when the interaction between mothers’ and fathers’ residual facial dominance is added to the model fathers’ residual facial dominance becomes significance (b = .458, z = 1.982, p = .048). Also of note, but not significant, the interaction between mothers’ and fathers’ residual facial dominance has a positive coefficient (b = .424, z = 1.593, p = .111). I will exponentiate these coefficients to make them more interpretable.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(res_fac_dom_interaction_model_a$coefficients)
For father’s residual facial dominance, the OR = 1.581, and for the interaction between mothers’ and fathers’ residual facial dominance the OR = 1.528. This indicates that, with other variables held constant, a one standard deviation unit increase in fathers’ residual facial dominance is associated with a 58.1% increase in the odds of having a first born son, and a one unit increase in the product of mothers’ and fathers’ residual facial dominance is associated with a 52.8% increase in the odds of having a first born son (while controlling for child’s’ age).
Because the whole interaction model is not significant compared to the intercept-only model, it would be very surprising if the interaction model would fit better than the main-effects only model, but I will go ahead and do the comparison anyways.
# Calculating the chi-square statistic to compare the interaction model with the main effects modelChi_res_fac_dom_interaction_model_a_v_main <- res_fac_dom_main_effects_model_a$deviance - res_fac_dom_interaction_model_a$devianceChi_res_fac_dom_interaction_model_a_v_main
[1] 2.788633
# Calculating the degrees of freedom to compare the interaction model with the main effects modeldf_res_fac_dom_interaction_model_a_v_main <- res_fac_dom_main_effects_model_a$df.residual - res_fac_dom_interaction_model_a$df.residualdf_res_fac_dom_interaction_model_a_v_main
As suspected, the Chi-squared test does not indicate that the interaction model fits better than the main effects model (x2(1) = 2.789, p = .09).
Although the interaction is not significant in our model, our hypothesis predicts that at high levels of fathers’ facial dominance mothers’ dominance predicts the probability of a first born son, so I will run a simple slopes and Johnson-Neyman analysis with mothers’ residual facial dominance as the focal predictor and fathers’ residual facial dominance as the moderator.
# Conducting the simple slopes and Johnson-Neyman analysissim_slopes(res_fac_dom_interaction_model_a, pred = z_m_res_facial_dominance, modx = z_f_res_facial_dominance, jnplot =TRUE)
JOHNSON-NEYMAN INTERVAL
The Johnson-Neyman interval could not be found. Is the p value for your
interaction term below the specified alpha?
SIMPLE SLOPES ANALYSIS
Slope of z_m_res_facial_dominance when z_f_res_facial_dominance = -1.000000e+00 (- 1 SD):
Est. S.E. z val. p
------- ------ -------- ------
-0.33 0.30 -1.08 0.28
Slope of z_m_res_facial_dominance when z_f_res_facial_dominance = 1.293464e-17 (Mean):
Est. S.E. z val. p
------ ------ -------- ------
0.10 0.22 0.43 0.66
Slope of z_m_res_facial_dominance when z_f_res_facial_dominance = 1.000000e+00 (+ 1 SD):
Est. S.E. z val. p
------ ------ -------- ------
0.52 0.38 1.36 0.17
Although nothing here is significant, I will exponentiate the coefficients here to make them more interpretable.
# Producing odds-ratios for each value of the slope for mothers' residual facial dominance for each level of the moderator in the simple slopes analysisexp(-.33) # -1 SD
[1] 0.7189237
exp(.10) # Mean
[1] 1.105171
exp(.52) # +1 SD
[1] 1.682028
# Taking the inverse of the odds-ratio for the first coefficient1-0.7189237
[1] 0.2810763
The Johnson-Neyman analysis indicates that there are no values of fathers’ residual facial dominance for which mothers’ residual facial dominance is a significant predictor of the probability of having a first born child. Worthy of note, however, the simple slopes analysis indicates that at low levels of fathers’ residual facial dominance (-1 SD) a one standard deviation increase in mothers’ residual facial dominance is associated with a 28.1% decrease in the odds of having a first born son and at high levels of fathers’ residual facial dominance (+1 SD) a one standard deviation increase in mothers’ residual facial dominance is associated with a 68.2% increase in the odds of having a first born son.
Although not significant, the direction of this moderation effect is consistent with our hypothesis.
It is possible that in the population there exists such an effect, whereas we simply do not have enough power to detect it.
Neutral Faces Only
Now we will repeat the same analysis as just above but with the neutral faces only.
First, we will standardize our independent variables in the new data frame, m_res_facial_dominance and f_res_facial_dominance.
# Standardizing the m_res_facial_dominance and f_res_facial_dominance variables as z_m_res_facial_dominance and z_f_res_facial_dominance (because the mean and SD may be slightly different in this new data frame)neutral_face_dyadic_data$z_m_res_facial_dominance <-scale(neutral_face_dyadic_data$m_res_facial_dominance, center =TRUE, scale =TRUE)neutral_face_dyadic_data$z_f_res_facial_dominance <-scale(neutral_face_dyadic_data$f_res_facial_dominance, center =TRUE, scale =TRUE)
Now we will fit our two models, one with the interaction term and one without.
# Fitting the model with only the main effects of residual facial dominance for mothers and fathersres_fac_dom_main_effects_model_a_neutral <-glm(shared_child_sex ~ z_m_res_facial_dominance + z_f_res_facial_dominance + shared_child_age, family =binomial(link = logit), data = neutral_face_dyadic_data)# Fitting the model with the main effects of residual facial dominance for mothers and fathers as well as their interactionres_fac_dom_interaction_model_a_neutral <-glm(shared_child_sex ~ z_m_res_facial_dominance + z_f_res_facial_dominance + shared_child_age + z_m_res_facial_dominance:z_f_res_facial_dominance, family =binomial(link = logit), data = neutral_face_dyadic_data)
Assumptions
Now we need to check our assumptions.
Dichotomous DV (satisfied)
One or more continuous IVs, continuous or nominal (satisfied)
Independence of observations (satisfied)
Categories of dichotomous DV and nominal IVs are mutually exclusive and exhaustive (satisfied)
No outliers: Will assess by plotting Cooks distance for each case in each models below (and identify the top 10 cases by number).
# Plotting Cook's distance for the main effects modelplot(res_fac_dom_main_effects_model_a_neutral, which =4, id.n =10)
#Plotting Cook's distance for the interaction modelplot(res_fac_dom_interaction_model_a_neutral, which =4, id.n =10)
We can see that none of the cases come close to the common threshold of 1.
No excessive multicollinearity: Will assess with VIF and tolerance statistics for the main effects model only because it cannot handle interactions.
# VIF and tolerance for the main effects modelvif(res_fac_dom_main_effects_model_a_neutral)
The VIF and tolerance statistics are well within the reasonable range.
Linear relationship between continuous IVs and logit of DV: Box-Tidwell Approach
We need to create the natural log transformations of each of the continuous IVs, but because the ln of a negative number is undefined we must first transform our predictor variables to all be positive by adding a constant. The below code determines the lowest value of z_m_res_facial_dominance and z_f_res_facial_dominance so that we can add a constant that makes all values for the variable positive before making the natural log transformation.
Given these minimum values, I will add 3 to each variable to make them positive, then I will make a natural log transformation to both.
# Creating c_z_m_res_facial_dominance, which represents mothers' residual facial dominance after adding a constant of 3neutral_face_dyadic_data$c_z_m_res_facial_dominance <- neutral_face_dyadic_data$z_m_res_facial_dominance +3# Creating ln_c_z_m_res_facial_dominance, which represents the natural log of mothers' residual facial dominance after adding a constant of 3neutral_face_dyadic_data$ln_c_z_m_res_facial_dominance <-log(neutral_face_dyadic_data$c_z_m_res_facial_dominance)# Creating c_z_f_res_facial_dominance, which represents fathers' residual facial dominance after adding a constant of 3neutral_face_dyadic_data$c_z_f_res_facial_dominance <- neutral_face_dyadic_data$z_f_res_facial_dominance +3# Creating ln_c_z_f_res_facial_dominance, which represents the natural log of fathers' residual facial dominance after adding a constant of 3neutral_face_dyadic_data$ln_c_z_f_res_facial_dominance <-log(neutral_face_dyadic_data$c_z_f_res_facial_dominance)
Now we need to fit and summarize a model with the main effects of each variable with the constant transformation plus the interaction of those variables with their natural log transformations. The summary should indicate that neither of the interaction terms are significant, thereby satisfying the assumption of linearity of the logit.
BT_test_res_facial_dom_model_neutral <-glm(shared_child_sex ~ c_z_m_res_facial_dominance + c_z_f_res_facial_dominance + c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance + c_z_f_res_facial_dominance:ln_c_z_f_res_facial_dominance, family =binomial(link = logit), data = neutral_face_dyadic_data)summary(BT_test_res_facial_dom_model_neutral)
Call:
glm(formula = shared_child_sex ~ c_z_m_res_facial_dominance +
c_z_f_res_facial_dominance + c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance +
c_z_f_res_facial_dominance:ln_c_z_f_res_facial_dominance,
family = binomial(link = logit), data = neutral_face_dyadic_data)
Coefficients:
Estimate Std. Error
(Intercept) -2.9930 3.6877
c_z_m_res_facial_dominance 1.3959 2.2373
c_z_f_res_facial_dominance 0.1359 1.9328
c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance -0.5843 1.0558
c_z_f_res_facial_dominance:ln_c_z_f_res_facial_dominance 0.2251 0.9547
z value Pr(>|z|)
(Intercept) -0.812 0.417
c_z_m_res_facial_dominance 0.624 0.533
c_z_f_res_facial_dominance 0.070 0.944
c_z_m_res_facial_dominance:ln_c_z_m_res_facial_dominance -0.553 0.580
c_z_f_res_facial_dominance:ln_c_z_f_res_facial_dominance 0.236 0.814
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 110.19 on 80 degrees of freedom
Residual deviance: 103.27 on 76 degrees of freedom
AIC: 113.27
Number of Fisher Scoring iterations: 4
Neither of the interaction terms are significant, indicating that we should not reject the null hypothesis that there is a linear relationship between our continuous predictors and the logit.
Summary of the Models
First, here is the summary of the model with the main effects of mothers’ residual facial dominance and fathers’ residual facial dominance only, along with a Chi-square test for whether the model with the main effects fits significantly better than the model with only the intercept.
# Producing the summary of the main effects modelsummary(res_fac_dom_main_effects_model_a_neutral)
Call:
glm(formula = shared_child_sex ~ z_m_res_facial_dominance + z_f_res_facial_dominance +
shared_child_age, family = binomial(link = logit), data = neutral_face_dyadic_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.12378 0.50792 0.244 0.8075
z_m_res_facial_dominance 0.15950 0.23749 0.672 0.5018
z_f_res_facial_dominance 0.60848 0.26840 2.267 0.0234 *
shared_child_age 0.05699 0.11271 0.506 0.6131
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 110.19 on 80 degrees of freedom
Residual deviance: 103.35 on 77 degrees of freedom
AIC: 111.35
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_res_fac_dom_main_effects_model_a_neutral <- res_fac_dom_main_effects_model_a_neutral$null.deviance - res_fac_dom_main_effects_model_a_neutral$devianceChi_res_fac_dom_main_effects_model_a_neutral
[1] 6.845574
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_res_fac_dom_main_effects_model_a_neutral <- res_fac_dom_main_effects_model_a_neutral$df.null - res_fac_dom_main_effects_model_a_neutral$df.residualdf_res_fac_dom_main_effects_model_a_neutral
[1] 3
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_res_fac_dom_main_effects_model_a_neutral <-1-pchisq(Chi_res_fac_dom_main_effects_model_a_neutral, df_res_fac_dom_main_effects_model_a_neutral)prob_Chi_res_fac_dom_main_effects_model_a_neutral
[1] 0.07698619
With the loss of power, the full main effects model just not significant (x2(3) = 6.846, p = .077). However, the predictor for fathers’ residual facial dominance remains significant (b = .608, z = 2.267, p = .023). I will exponentiate the coefficients to make this more interpretable.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(res_fac_dom_main_effects_model_a_neutral$coefficients)
The odds ratio for father’s residual facial dominance is OR = 1.838, indicating that while holding mother’s residual facial dominance constant a one standard deviation increase in father’s residual facial dominance is associated with a 83.8% higher odds of having a first born son, while controlling for time since birth (child age).
Now we will summarize the model with the main effects and their interaction, and we will compare it’s fit to the intercept only model. After this we will compare it to the main effects model.
# Summarizing the interaction modelsummary(res_fac_dom_interaction_model_a_neutral)
Call:
glm(formula = shared_child_sex ~ z_m_res_facial_dominance + z_f_res_facial_dominance +
shared_child_age + z_m_res_facial_dominance:z_f_res_facial_dominance,
family = binomial(link = logit), data = neutral_face_dyadic_data)
Coefficients:
Estimate Std. Error z value
(Intercept) 0.08871 0.51215 0.173
z_m_res_facial_dominance 0.25021 0.25687 0.974
z_f_res_facial_dominance 0.70917 0.28899 2.454
shared_child_age 0.06841 0.11392 0.600
z_m_res_facial_dominance:z_f_res_facial_dominance 0.34338 0.27175 1.264
Pr(>|z|)
(Intercept) 0.8625
z_m_res_facial_dominance 0.3300
z_f_res_facial_dominance 0.0141 *
shared_child_age 0.5482
z_m_res_facial_dominance:z_f_res_facial_dominance 0.2064
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 110.19 on 80 degrees of freedom
Residual deviance: 101.64 on 76 degrees of freedom
AIC: 111.64
Number of Fisher Scoring iterations: 4
# Calculating the chi-squared statistic for the model compared to the baseline model with only the interceptChi_res_fac_dom_interaction_model_a_neutral <- res_fac_dom_interaction_model_a_neutral$null.deviance - res_fac_dom_interaction_model_a_neutral$devianceChi_res_fac_dom_interaction_model_a_neutral
[1] 8.552133
# Calculating the degrees of freedom for the chi-square statistic comparing the model to it's baselinedf_res_fac_dom_interaction_model_a_neutral <- res_fac_dom_interaction_model_a_neutral$df.null - res_fac_dom_interaction_model_a_neutral$df.residualdf_res_fac_dom_interaction_model_a_neutral
[1] 4
# Conducting a chi-square test for the p-value using the chi-square statistic and the degrees of freedomprob_Chi_res_fac_dom_interaction_model_a_neutral <-1-pchisq(Chi_res_fac_dom_interaction_model_a_neutral, df_res_fac_dom_interaction_model_a_neutral)prob_Chi_res_fac_dom_interaction_model_a_neutral
[1] 0.07332266
The full interaction model is not significant (x2(4) = 8.552, p = .073), but, although the interaction term is not significant, when the interaction term was added to the model the coefficient became larger for fathers’ residual facial dominance (b = .709, z = 2.454, p = .014). Again, I will exponentiate this coefficient to understand it better.
# Exponentiating the coefficients of the model to reveal the odds ratio conversion of the coefficientsexp(res_fac_dom_interaction_model_a_neutral$coefficients)
The odds ratio for father’s residual facial dominance is now 2.03, indicating that while holding mother’s residual facial dominance constant a one standard deviation increase in father’s residual facial dominance is associated with a 103% higher odds of having a first born son.
Now to compare the main-effects model with the main-effects and interaction model.
# Calculating the chi-square statistic to compare the interaction model with the main effects modelChi_res_fac_dom_interaction_model_a_v_main_neutral <- res_fac_dom_main_effects_model_a_neutral$deviance - res_fac_dom_interaction_model_a_neutral$devianceChi_res_fac_dom_interaction_model_a_v_main_neutral
[1] 1.706559
# Calculating the degrees of freedom to compare the interaction model with the main effects modeldf_res_fac_dom_interaction_model_a_v_main_neutral <- res_fac_dom_main_effects_model_a_neutral$df.residual - res_fac_dom_interaction_model_a_neutral$df.residualdf_res_fac_dom_interaction_model_a_v_main_neutral
The Chi-squared test does not indicate that the interaction model fits better than the main effects model (x2(1) = 1.707, p = .191).
Although the interaction is not significant in our model, our hypothesis predicts that at high levels of fathers’ facial dominance mothers’ dominance predicts the probability of a first born son, so I will run a simple slopes and Johnson-Neyman analysis with mothers’ residual facial dominance as the focal predictor and fathers’ residual facial dominance as the moderator.
# Conducting the simple slopes and Johnson-Neyman analysissim_slopes(res_fac_dom_interaction_model_a_neutral, pred = z_m_res_facial_dominance, modx = z_f_res_facial_dominance, jnplot =TRUE)
JOHNSON-NEYMAN INTERVAL
The Johnson-Neyman interval could not be found. Is the p value for your
interaction term below the specified alpha?
SIMPLE SLOPES ANALYSIS
Slope of z_m_res_facial_dominance when z_f_res_facial_dominance = -1.000000e+00 (- 1 SD):
Est. S.E. z val. p
------- ------ -------- ------
-0.09 0.31 -0.30 0.77
Slope of z_m_res_facial_dominance when z_f_res_facial_dominance = -2.193033e-17 (Mean):
Est. S.E. z val. p
------ ------ -------- ------
0.25 0.26 0.97 0.33
Slope of z_m_res_facial_dominance when z_f_res_facial_dominance = 1.000000e+00 (+ 1 SD):
Est. S.E. z val. p
------ ------ -------- ------
0.59 0.43 1.39 0.16
Although nothing here is significant, I will exponentiate the coefficients here to make them more interpretable.
# Producing odds-ratios for each value of the slope for mothers' residual facial dominance for each level of the moderator in the simple slopes analysisexp(-.09) # -1 SD
[1] 0.9139312
exp(.25) # Mean
[1] 1.284025
exp(.59) # +1 SD
[1] 1.803988
# Taking the inverse of the odds-ratio for the first coefficient1-0.9139312
[1] 0.0860688
The Johnson-Neyman analysis indicates that there are no values of fathers’ residual facial dominance for which mothers’ residual facial dominance is a significant predictor of the probability of having a first born child. Similar to the analysis with all facial expression types, however, the simple slopes analysis indicates that at low levels of fathers’ residual facial dominance (-1 SD) a one standard deviation increase in mothers’ residual facial dominance is associated with a 8.6% decrease in the odds of having a first born son; at the mean for fathers’ residual facial dominance a one standard deviation increase in mothers’ residual facial dominance is associated with an 28.4% increase in the odds of having a first born son; and at high levels of fathers’ residual facial dominance (+1 SD) a one standard deviation increase in mothers’ residual facial dominance is associated with a 80.4% increase in the odds of having a first born son.
Although not significant, the direction of this moderation effect is consistent with our hypothesis, like in the model with all facial expressions included.
Again, it is possible that in the population there exists such an effect, whereas we simply do not have enough power to detect it.
Clean-Up From Study 2 Analysis
I will now write the final versions of the analysis files that I have used as csv files so that they are not lost.
# Writing the final version of the dyadic dataset as "post_analysis_full_dyads.csv"write.csv(dyadic_data, file ="./data/post_analysis_dyadic_data_full.csv")# Writing the final version of the female-only individual data as "post_analysis_individual_data_females_only.csv"write.csv(female_individuals, file ="./data/post_analysis_individual_data_females_only.csv")# Writing the final version of the dyadic dataset with only neutral faces as "post_analysis_dyadic_data_neutral_faces_only"write.csv(neutral_face_dyadic_data, file ="./data/post_analysis_dyadic_data_neutral_faces_only.csv")# Remove the objects from the environmentrm(list =ls())
References
Buunk, A., & Fisher, M. (2009). Individual differences in intrasexual competition. Journal of Evolutionary Psychology, 7, 37–48. https://doi.org/10.1556/JEP.7.2009.1.5
Cheng, J. T., Tracy, J. L., & Henrich, J. (2010). Pride, personality, and the evolutionary foundations of human social status. Evolution and Human Behavior, 31(5), 334–347. https://doi.org/10.1016/j.evolhumbehav.2010.02.004
Forrest, S., Eatough, V., & Shevlin, M. (2005). Measuring Adult Indirect Aggression: The Development and Psychometric Assessment of the Indirect Aggression Scales. Aggressive Behavior, 31(1), 84–97. https://doi.org/10.1002/ab.20074
Ganzeboom, H. (2010, May 1). A new International Socio-Economic Index (ISEI) of occupational status for the International Standard Classification of Occupation 2008 (ISCO-08) constructed with data from the ISSP 2002–2007.
Goldberg, L. R., Johnson, J. A., Eber, H. W., Hogan, R., Ashton, M. C., Cloninger, C. R., & Gough, H. G. (2006). The international personality item pool and the future of public-domain personality measures. Journal of Research in Personality, 40(1), 84–96. https://doi.org/10.1016/j.jrp.2005.08.007
Grant, V. J. (1992). The measurement of dominance in pregnant women by use of the Simple Adjective Test. Personality and Individual Differences, 13(1), 99–102. https://doi.org/10.1016/0191-8869(92)90225-E
Johnson, P. O., & Neyman, J. (1936). Tests of certain linear hypotheses and their application to some educational problems. Statistical Research Memoirs, 1, 57–93.
Palmer-Hague, J. L., & Watson, N. V. (2016). Effects of Mother and Father Dominance on Offspring Sex in Contemporary Humans. Adaptive Human Behavior and Physiology, 2(1), 57–76. https://doi.org/10.1007/s40750-015-0032-6
Schindelin, J., Arganda-Carreras, I., Frise, E., Kaynig, V., Longair, M., Pietzsch, T., Preibisch, S., Rueden, C., Saalfeld, S., Schmid, B., Tinevez, J.-Y., White, D. J., Hartenstein, V., Eliceiri, K., Tomancak, P., & Cardona, A. (2012). Fiji - an Open Source platform for biological image analysis. Nature Methods, 9(7), 10.1038/nmeth.2019. https://doi.org/10.1038/nmeth.2019
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin, 86(2), 420. https://psycnet.apa.org/record/1979-25169-001